During the recent Big Data Workshop, Mike Mahoney mentioned Fast Food (see Revisiting the Nystrom Method for Improved Large-Scale Machine Learning, all the slides of the workshop are here) and it reminded me of several items. First to understand Fast Food, you need to understand Random Kitchen Sinks, for that, either you can go back to this 2007 Nuit Blanche blog entry ( Compressed Sensing: Random Features for Large-Scale Kernel Machines ) or just watch the very illuminating explanation by none other than Ali Rahimi:
A popular trend in computer vision, graphics, and machine learning is to replace sophisticated statistical models with simpler generic ones, and to compensate for the missing domain knowledge with huge datasets. These huge datasets in turn require us to solve huge numerical optimization problems that tax popular off-the-shelf implementations of popular algorithms. I describe a randomized way to solve these large scale optimization problems quickly, in a few lines of code, and with provably good performance guarantees. For example, a randomized version of Adaboost and of kernelized Support Vector Machine can fit millions of data points within a few minutes with almost no loss in classification accuracy. Similarly, very large Semi-Definite Programs can be solved quickly by approximating them with suitably randomized linear programs. All of these tricks randomize over most of the variables of optimization and carry out a much cheaper optimization over the remaining variables. A theoretical analysis of these tricks relies on the concentration of measure phenomenon in Banach spaces, and guarantees that in the cases I describe, these tricks work almost as well as carrying out the full optimization. I'll also describe a curious and related way of constructing numerical algorithms that execute reliably on unreliable CPUs that run at subthreshold voltages.
The presentation is very well done and I particularly like the connection to hardware at the very end. Alex Smola also provides an explanation on his blog ( The Neal Kernel and Random Kitchen Sinks). So what is Fast Food ? As shown in the recent video ( Fast Food: Approximating Kernel Expansion in Loglinear Time ), Alex thinks that the randomization by the multiplication with a gaussian matrix takes too much time and finds a different structured matrix that resemble a Gaussian to speed up the matrix-vector multiply of the Kernel operation. Think of it as a structured randomization that takes advantage of some speedy matrix-vector multiplication (here the Hadamard transform). In compressive sensing, we have seen those in particular because sometimes the RAM could not hold the measurement matrix. Some have taken advantage of it though the use of Hadamard, some have looked at random Kronecker products. Yes, Virginia, that field is wide open....
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.