## Thursday, May 16, 2013

### Another day at the Big Data: Theoretical and Practical Challenges Workshop

Yesterday was the second day at the Big data: theoretical and practical challenges - May, 14-15, 2013 - IHP, Paris. A big thank you to both Francis Bach and Michael Jordan for organizing the meetingFrancis tells me the slides will be available at some in time so we can come back to specifics of some presentations but for the time being, here are just a few notes that drew my attention because of their relevance to some of the subjects we discussed here before.

In Piotr Indyk's presentation on how the sFFT work (I very much liked the Bins and Balls explanation) , this is the first time I heard about an Hadamard based algorithm with good running time. I asked Piotr about it and here are the references for that work:

The improved algorithm appeared in
L.A. Levin. Randomness and non-determinism. J. Symb. Logic, 58(3):1102?1103, 1993.
Alas, it is deeply buried in there. A good explanation is given in: O. Goldreich. Modern cryptography, probabilistic proofs and pseudorandomness. Algorithms and Combinatorics, 17, 1999. Appendix C.2.3 Improved Implementation of Algorithm. Alas, there is no actual implementation (i.e., code) available, to the best of my knowledge.
Thanks Piotr ! If now we could compare sFFT with the new one from Berkeley (see below) and the other superfast FFT for compressible signals, that would be great :-) [ of note all the Slides of the recent Workshop on Sparse Fourier Transform organized by Piotr are here.  ]

Given an \$n\$-length input signal \$\mbf{x}\$, it is well known that its Discrete Fourier Transform (DFT), \$\mbf{X}\$, can be computed in \$O(n \log n)\$ complexity using a Fast Fourier Transform (FFT). If the spectrum \$\mbf{X}\$ is exactly \$k\$-sparse (where \$k<0:99); and (ii) computational complexity: we can reliably compute the DFT X using O(k logk) operations, where the constants in the big Oh are small and are related to the constants involved in computing a small number of DFTs of length approximately equal to the sparsity parameter k. Our algorithm succeeds with high probability, with the probability of failure vanishing to zero asymptotically in the number of samples acquired, M. Our approach is based on ﬁlterless subsampling of the input signal x using a small set of carefully chosen uniform subsampling patterns guided by the Chinese Remainder Theorem (CRT). The idea is to cleverly exploit, rather than avoid, the resulting aliasing artifacts induced by this subsampling. Speciﬁcally, our subsampling operation on x is designed to create aliasing patterns on the spectrum X that “look like” parity-check constraints of good erasure-correcting sparse-graph codes. We show how computing the sparse DFT X is equivalent to decoding of these sparse-graph codes. These codes further allow for fast peeling-style decoding, similar to that of fountain codes. The resulting DFT computation is low in both sample complexity and decoding complexity. We accordingly dub our algorithm the FFAST (Fast Fourier Aliasing-based Sparse Transform) algorithm. We analytically connect our proposed CRT-based aliasing framework to random sparse-graph codes, and analyze the performance of our algorithm using density evolution techniques from coding theory. Additionally our approach uncovers a new deterministic and efﬁcient class of compressive sensing measurement matrices as an alternative to the popular choice of random matrices. Based on the qualitative nature of the subsampling patterns needed, our analysis is decomposed into two regimes: i) “very-sparse” (0 lt 1=3), where M < 2:5k, and ii) “less-sparse” (1=3 lt 1), where M lt 4k for lt 0:99. While our theory is asymptotic, our simulation results reveal that in practice our algorithm works even for moderate values of k and n, e.g. k = 200 and n = 4080. Further, when k = 300, and n is approximately 3:8 million, our algorithm achieves computational savings by a factor of more than 6000, and savings in the number of input samples by a factor of more than 3900 over the standard FFT. We analyze the 1-D DFT here, but our approach can be extended in a straightforward manner to multi-dimensional DFTs. Further, while our analysis is for the exactly sparse case, our approach can be extended to be robust to the more general approximately sparse and noisy measurement cases as well, albeit at the cost of increased sample and computational complexity. We provide extensive simulation results in Section VIII that corroborate our theoretical ﬁndings, and validate the empirical performance of the FFAST algorithm for a wide variety of 1D and 2D DFT settings for signals having an exactly or approximately sparse Fourier spectrum.

Kannan tells me that in the next version of their papers sFFT will be included in the comparison and that an implementation should be out by the end of the summer. woohoo!

Coming back to the meeting, I was surprised to see such a good scores for the Netflix problem in Lester Mackey's talk on Divide-and-Conquer Matrix Factorization which we have featured here before (and is listed in the Advanced Matrix Factorization Jungle page page). I'll wait for the slides to be sure of what I saw. Since DCF goes so fast I wonder if Cable and I shouldn't be using it in our CAI adventures

Finally, I note that the talk by Mike Mahoney on Revisiting the Nystrom Method for Improved Large-Scale Machine Learning is the rebirth of randomization thanks to the use of theoretically stronger mathematical results (Johnson-Lindenstrauss) and is behind the Randomized Numerical Linear Algebra movement (RandNLA tag on Nuit Blanche). during the presentation I think I heard Mike saying because of the rounding errors, the randomized approach had a better accuracy than full LAPACK which reminded me this entry: Slowly but surely they'll join our side of the Force... Anyway, I have seen most of the figures of Mike's presentation in the following two papers:

We reconsider randomized algorithms for the low-rank approximation of symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods based on leverage scores. We complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projections methods. These bounds are qualitatively superior to existing bounds---e.g., improved additive-error bounds for the spectral and Frobenius norm errors and relative-error bounds for the trace norm error.

and the attendant ICML version.

Image Credit: NASA/JPL-Caltech

This image was taken by Front Hazcam: Right B (FHAZ_RIGHT_B) onboard NASA's Mars rover Curiosity on Sol 275 (2013-05-15 15:12:59 UTC).

Full Resolution Anonymous said...

Igor said...

Thanks Anonymous.

I see they use the 3600 times better figure. I expect it won't be the last time :-)

Igor. Anonymous said...

From Chief D-Wave Skeptic

http://www.scottaaronson.com/blog/?p=1400 Anonymous said...

Daniel Lidar talk on D-Wave 1 machine at USC. 