Nuit Blanche: Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Thursday, June 25, 2015

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Random projections or sign random projections are useful beyond distance preservation. We've known this for a little a while when looking at how manifolds kept some structure under random projection (see Rich Baraniuk's presentation, or Mike Wakin's thesis, or here), or how the angles between submanifolds provided some justification for reconstruction solvers (also here), but today's paper wants to look into this issue deeper as regards to the standpoint of machine learning.

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection by Ata Kabán

Dot product is a key building block in a number of data mining algorithms from classification, regression, correlation clustering, to information retrieval and many others. When data is high dimensional, the use of random projections may serve as a universal dimensionality reduction method that provides both low distortion guarantees and computational savings. Yet, contrary to the optimal guarantees that are known on the preservation of the Euclidean distance cf. the Johnson-Lindenstrauss lemma, the existing guarantees on the dot product under random projection are loose and incomplete in the current data mining and machine learning literature. Some recent literature even suggested that the dot product may not be preserved when the angle between the original vectors is obtuse. In this paper we provide improved bounds on the dot product under random projection that matches the optimal bounds on the Euclidean distance. As a corollary, we elucidate the impact of the angle between the original vectors on the relative distortion of the dot product under random projection, and we show that the obtuse vs. acute angles behave symmetrically in the same way. In a further corollary we make a link to sign random projection, where we generalise earlier results. Numerical simulations confirm our theoretical results. Finally we give an application of our results to bounding the generalisation error of compressive linear classifiers under the margin loss.

Join the CompressiveSensing subreddit or the Google+ Community and post there !