Nuit Blanche: Random Kitchen Sinks, Fast Food and other randomized kernel evaluations

Thursday, May 23, 2013

Random Kitchen Sinks, Fast Food and other randomized kernel evaluations

During the recent Big Data Workshop, Mike Mahoney  mentioned Fast Food (see Revisiting the Nystrom Method for Improved Large-Scale Machine Learning, all the slides of the workshop are here) and it reminded me of several items. First to understand Fast Food, you need to understand Random Kitchen Sinks, for that, either you can go back to this 2007 Nuit Blanche blog entry ( Compressed Sensing: Random Features for Large-Scale Kernel Machines ) or just watch the very illuminating explanation by none other than Ali Rahimi:

[ The video is from Microsoft Research ( http://research.microsoft.com/apps/video/default.aspx?id=103390 ) . Slides are available from that address. several video formats are also available from Microsoft Research ]. Here is the abstract of the talk: Random Kitchen Sinks

A popular trend in computer vision, graphics, and machine learning is to replace sophisticated statistical models with simpler generic ones, and to compensate for the missing domain knowledge with huge datasets. These huge datasets in turn require us to solve huge numerical optimization problems that tax popular off-the-shelf implementations of popular algorithms. I describe a randomized way to solve these large scale optimization problems quickly, in a few lines of code, and with provably good performance guarantees. For example, a randomized version of Adaboost and of kernelized Support Vector Machine can fit millions of data points within a few minutes with almost no loss in classification accuracy. Similarly, very large Semi-Definite Programs can be solved quickly by approximating them with suitably randomized linear programs. All of these tricks randomize over most of the variables of optimization and carry out a much cheaper optimization over the remaining variables. A theoretical analysis of these tricks relies on the concentration of measure phenomenon in Banach spaces, and guarantees that in the cases I describe, these tricks work almost as well as carrying out the full optimization. I'll also describe a curious and related way of constructing numerical algorithms that execute reliably on unreliable CPUs that run at subthreshold voltages.

The presentation is very well done and I particularly like the connection to hardware at the very end. Alex Smola also provides an explanation on his blog ( The Neal Kernel and Random Kitchen Sinks). So what is Fast Food ? As shown in the recent video ( Fast Food: Approximating Kernel Expansion in Loglinear Time ), Alex thinks that the randomization by the multiplication with a gaussian matrix takes too much time and finds a different structured matrix that resemble a Gaussian to speed up the matrix-vector multiply of the Kernel operation. Think of it as a structured randomization that takes advantage of some speedy matrix-vector multiplication (here the Hadamard transform). In compressive sensing, we have seen those in particular because sometimes the RAM could not hold the measurement matrix. Some have taken advantage of it though the use of Hadamard, some have looked at random Kronecker products. Yes, Virginia, that field is wide open....

Join the CompressiveSensing subreddit or the Google+ Community and post there !