If there is one thing that is changing our views of high dimensional data it is the Johnson-Lindestrauss lemma, a concentration of measure result from 1984 that is only bringing to bear on our daily life as we are slowly being swallowed by the tsunami of data around us. So when Laurent wrote some addtional insight on the subject on his blog and then put out an arxiv preprint, I paid attention and you should too. The very well written blog entry is entitled When Buffon’s needle problem meets the Johnson-Lindenstrauss Lemma, while the more formal arxiv preprint goes with: A Quantized Johnson Lindenstrauss Lemma: The Finding of Buffon's Needle by Laurent Jacques
In 1733, Georges-Louis Leclerc, Comte de Buffon in France, set the ground of geometric probability theory by defining an enlightening problem: What is the probability that a needle thrown randomly on a ground made of equispaced parallel strips lies on two of them? In this work, we show that the solution to this problem, and its generalization to N dimensions, allows us to discover a quantized form of the Johnson-Lindenstrauss (JL) Lemma, i.e., one that combines a linear dimensionality reduction procedure with a uniform quantization of precision \delta>0. In particular, given a finite set S in R^N of |S| points and a distortion level \epsilon>0, as soon as M > M_0 = log(log |S|/\epsilon^2), we can (randomly) construct a mapping from (S, \ell_2) to ((\delta Z)^M, \ell_1) that approximately preserves the pairwise distances between the points of S. Interestingly, compared to the common JL Lemma, the mapping is quasi-isometric and we observe both an additive and a multiplicative distortions on the embedded distances. These two distortions, however, decay as O(\sqrt(log |S|/M)) when M increases. Moreover, for coarse quantization, i.e., for high \delta compared to the set radius, the distortion is mainly additive, while for small \delta we tend to a Lipschitz isometric embedding. Finally, we show that there exists "almost" a quasi-isometric embedding of (S, \ell_2) in ((\delta Z)^M, \ell_2). This one involves a non-linear distortion of the \ell_2-distance in S that vanishes for distant points in this set. Noticeably, the additive distortion in this case decays slower as O((\log S/M)^(1/4)).
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.