Friday, December 21, 2007

Compressed Sensing: Intrinsic Dimensionality of a Dataset, StOMP for large scale problems and 2007 WD5 is aiming at Mars

In Random Projections for Manifold Learning, Chinmay Hegde, Mike Wakin, and Richard Baraniuk try to devise a means by which one can determined the intrinsic dimensionality of a dataset. This is important as it has a direct impact on the size of the measurement matrix that one wants to be as small as possible. It uses the Grassberger-Procaccia algorithm to estimate the intrinsic dimension of the manifold using random projections of the original dataset. The abstract reads:

We derive theoretical bounds on the performance of manifold learning algorithms, given access to a small number of random projections of the input dataset. We prove that with the number of projections only logarithmic in the size of the original space, we may reliably learn the structure of the nonlinear manifold, as compared to performing conventional manifold learning on the full dataset.

In the NIPS version of that technical report, the abstract reads:

We propose a novel method for linear dimensionality reduction of manifold modeled
data. First, we show that with a small number M of random projections of sample points in RN belonging to an unknown K-dimensional Euclidean manifold, the intrinsic dimension (ID) of the sample set can be estimated to high accuracy. Second, we rigorously prove that using only this set of random projections, we can estimate the structure of the underlying manifold. In both cases, the number of random projections required is linear in K and logarithmic in N, meaning that K less than M much less than N. To handle practical situations, we develop a greedy algorithm to estimate the smallest size of the projection space required to perform manifold learning. Our method is particularly relevant in distributed sensing systems and leads to significant potential savings in data acquisition, storage and transmission costs.
in the paper the algorithm is given in an explicit way and requires ISOMAP. These two papers are important for Machine Learning as they now give a faster mean of estimating the underlying manifold using random projections.

David Donoho, Yaakov Tsaig, Iddo Drori, and Jean-Luc Starck, Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit, the abstract reads:

Finding the sparsest solution to underdetermined systems of linear equations y = x is NP-hard in general. We show here that for systems with ‘typical’/‘random’ , a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our proposal, Stagewise Orthogonal Matching Pursuit (StOMP), successively transforms the signal into a negligible residual. Starting with initial residual r0 = y, at the s-th stage it forms the ‘matched filter’ T rs−1, identifies all coordinates with amplitudes exceeding a specially-chosen threshold, solves a least-squares problem using the selected coordinates, and subtracts the least squares fit, producing a new residual. After a fixed number of stages (e.g. 10), it stops. In contrast to Orthogonal Matching Pursuit (OMP), many coefficients can enter the model at each stage in StOMP while only one enters per stage in OMP; and StOMPtakes a fixed number of stages (e.g. 10), while OMP can take many (e.g. n). StOMP runs much faster than competing proposals for sparse solutions, such as l1 minimization and OMP, and so is attractive for solving large-scale problems.
We use phase diagrams to compare algorithm performance. The problem of recovering a k-sparse vector x0 from (y, ) where  is random n × N and y = x0 is represented by a point (n/N, k/n) in this diagram; here the interesting range is k less than n less than N. For n large, StOMP correctly recovers (an approximation to) the sparsest solution of y = x over a region of the sparsity/indeterminacy plane comparable to the region where l1 minimization is successful. In fact, StOMPoutperforms both l1 minimization and OMP for extremely underdetermined problems. We rigorously derive a conditioned Gaussian distribution for the matched filtering coefficients at each stage of the procedure and rigorously establish a large-system limit for the performance variables of StOMP. We precisely calculate large-sample phase transitions; these provide asymptotically precise limits on the number of samples needed for approximate recovery of a sparse vector by StOMP. We give numerical examples showing that StOMPrapidly and reliably finds sparse solutions in compressed sensing, decoding of error-correcting codes, and overcomplete representation.

Always pay attention when Dave Donoho says something. I also found the qualifying exams slide of Deanna Needell interesting, especially slide 116/145 an information I did not know. I am also impatient to see what Thong Do a student of Trac Tran is talking about in this seminar:
In this work, we propose a novel fast compressive sampling algorithm that is based on a new concept of structurally random ensembles. The new proposed algorithm not only enables the system to compressively sample signals very fast but also requires very low complexity with other highly desired features for practical applications such as: integer friendliness, streaming capability. Surprisingly, this novel algorithm is also proposed to solve a closely related problem of high dimensional reduction in theoretical computer sciences.

I mentioned earlier the Titan flyby of today, but guess what kids, this one trumps it: 2007 WD5, an asteroid is going to hit Mars where we currently have five robots ( Spirit and Opportunity, Odyssey, Mars Express and the Mars Reconnaissance Orbiter). Woohoo, where is my popcorn ?.


Anonymous said...

Igor, thanks for mentioning my abstract. you can find my conference paper "Fast compressive sampling with structurally random matrices" (accepted to ICASSP'08) at CS Rice website.

Thong Do
The Johns Hopkins University.

Igor said...

Thanks I had not seen it. I made a mention of it.

Very interesting.