Tuesday, July 17, 2007

Synthesis on Compressed Sensing and Dimensionality Reduction/ Imaging a Fractal Romanesco

[ This is an old post featured on 7/17/07, most of the thoughts mentioned here have received some type of closure or are better understood]

I recently did a guest appearance on John Langford's blog on the subject of Machine Learning and Compressed Sensing. Andrew Gelman also talked about Compressed Sensing as a Universal Dimension Reduction technique by Mike Wakin and Rich Baraniuk in his blog (Statistical Modeling, Causal Inference, and Social Science). Overall, the response was very interesting either through personal e-mail or in the comments section. What comes out of this exercise is a list of elements that had been difficult for me to understand when I first looked at it and then communicate:
  • There is a confusion that the L1 regularization providing L0 capabilities is the main reason as to why compressed sensing works. This is not truly the case. Historically it has enabled it but it is not required. One can also point to the limits of this statement ( Breakdown Point of Model When the Number of Variables Exceeds the Observations)
  • There is an enormous confusion on the reason why Compressed Sensing is "better" than the Johnson-Lindenstrauss Lemma. In short, Compressed Sensing uses the same techniques as JL but Compressed Sensing provides a tighter bound (A simple proof of the restricted isometry property for random matrices) for specific random projections.
  • There is the thought that Compressed Sensing is the same as random projections. This is not the case, one could acquire analog signals of Legendre polynomials coefficients and obtain results (provided the Legendre polynomials are incoherent with the image/signal of interest). Also the random projections currently used in the Compressed Sensing literature are coming from a very specific family of distributions. The current random projections used in Machine learning, here for instance, fulfill obviously the JL lemma but not the RIP constraint of Compressed Sensing. Here is a good description by Terry tao on trying to build matrices that have the RIP (or UUP) property.
  • I am always eyeing a solution to dimensionality reduction in the context of imagery where a large amount of pixels need to be reduced to smaller set of most important features. This is not the type of problem some other people face. In the statistics and social science world however, datasets generally have fewer data dimension in the first place and so a dimensionality reduction does not bring a tremendous advantage as found in the imagery business.
  • In the Compressed Sensing approach, every new measurement is democratically adding information to the scene of interest. This is different than say, PCA where the first few eigenvectors are supposed to bring the most salient features.
  • In particular, there are currently no known sparse random projection matrices fulfilling the RIP i.e. taking a random sample of the signal of interest currently require a full sampling, not a sparse sample. The only sparse random projection used in compressed sensing that I know about, eventually uses group testing to get the full information later.
  • Compressed Sensing with dictionary of bases is well understood but the new concept of using a similar approach to decompose manifolds clearly requires new or additional wording. It would seem to me that it can be better put by saying that the manifold has a sparse decomposition in the basis of eigenfunctions of the Laplacian operator on that manifold/graph (diffusion geometries and wavelets of Mauro Maggioni, Stephane Lafon an Ronald Coifman).

Finally, there was this issue as to why the image reconstructed from the Rice one-pixel camera is still blurry. This is my abbreviated response to this comment:

With regards to the blurring ( as can be seen here). I agree even with the 40% being blurry but the Rice CS camera is not a purely analog camera: i.e the picture is projected on some Haar basis (every pixel shining is a step function over that area). In effect in order to have good reconstruction, one needs to have good incoherence between the target space and the CS signal space. Haar bases are good at decomposing natural images so I would not expect very good incoherence between the Haar basis and the wavelet basis (used for the reconstruction). If the Rice CS camera were to be replicating diracs instead of Haar functions, you’d [probably] get less blurry images. A similar note was made by Terry Tao in one of the comments in his blog.
Also you have similar blurry reconstructions from live images from the Random Lens Imager at MIT that uses the compressed sensing framework.... one of the reason is that we are not using 3d functions to do the reconstruction as the camera is certainly sensitive to depth as well. Please note the need for Machine Learning techniques to do the calibration of these imagers.

Ever since that comment, I found a mention of a basis that can be designed to be incoherent with the Haar Basis: the noiselet basis of Coifman, Geshwind and Meyer [1] for which I can find no public paper describing their constructions other than that of Romberg and Candes. It may be interesting to see how the reconstruction mentioned above fares with this noiselet basis. It would also be judicious to take photos of brownian or chaotic (high dimensional) systems so that there is a better public understanding of the connection between the sampling rate and reconstruction requirements. Similarly, one could play with the compressed sensing imaging of a, rare to find in Texas , Romanesco. Plain Broccolis should do. One could also try to make the connection between the fact that in order for Compressed sensing to work well there is a need for the distribution of projection coefficients to follow a power law and the fact that some objects (like the romanesco) have features (curvelet-like) that follow a power law. Obviously this could be done without the one-pixel camera but a real experiment always trumps a computer one.

Reference: [1] R. Coifman, F. Geshwind, and Y. Meyer. Noiselets. Appl. Comp. Harmonic Analysis, 10:27–44, 2001


Anonymous said...

This is not the type of problem some other people face. In the statistics and social science world however, datasets generally have fewer data dimension in the first place and so a dimensionality reduction does not bring a tremendous advantage as found in the imagery business.

I don't agree with this statement. Two points:

1) High-dimensionality is a huge problem for statistics. Perhaps in some social science data the ratio of sample size to number of dimensions is large, but I think this is an exception rather than the rule for modern statistics. Here's a popular example: genomic data.

2) A more salient difficulty is that more often the distribution of the covariates (or predictors or features) is not directly controllable. Let's consider the usual linear model:

Y = X b + e

The goal in compressed sensing, as in classical regression is to estimate b. The bounds that compressed sensing's theory provides depend on assuming sparsity of b, as well as a certain incoherence property of X. Unfortunately, verifying that X satisfies the property is intractable. However, if X is drawn at random from a "nice" distribution it will have that property with high probability. This is fine as long as you have the luxury of using arbitrary X for your measurements. The more common situation that we face is that X is not under our control. We merely observe it.

Igor said...

For point 1, I am not a specialist but my social science background is pretty thin, I can see how very large databases used for data mining for credit purpose could be construed as social science. I am not sure too many people would agree on Genomics being primarily social science, but as I say, I am more than happy to see that engineers in the hard sciences are not the only ones to face enormous challenges. My view was that in social science, your dimensionality maybe larger than in the hard sciences but the data is generally several order of magnitude less than the dimensionality of the data and so the idea is that dimensionality reduction would not be as high because the model would not try to have too many dimensions in the first place in order to fit the data. However, if you include Genomics, I agree with you.

Thanks for point 2. Obviously being able to command a 15 microns by 15 microns mirror on a DMD (rice one pixel camera) is much easier than controlling people's behavior :-)


Anonymous said...

not sure too many people would agree on Genomics being primarily social science, but as I say, I am
Oh, I think I misunderstood your original blog post then.

In the statistics and social science world
I took this to mean statistics and social science separately, but now I understand that you meant statistics in the social science world.