Wednesday, October 19, 2011

A Q&A with Marco Duarte and Dror Baron

In yesterday's entry ( Can Netflix give you Superman's X-ray vision ?Emmanuel Candes, Thomas Strohmer and Vladislav Voroninski were featuring the reconstruction of a signal with few measurements without the need to resort to sparsity. In a similar but different direction, two weeks ago, I mentioned the work of Dror Baron and  Marco Duarte entitled Universal MAP Estimation in Compressed Sensing and was intrigued about how one could perform also compressed sensing without a sparsity argument. I went ahead and struck a conversation with Dror and  Marco who were good sports about it since I usually ask a lot of dumb questions. Here is the summary of our back and forth exchange.   

While I am reading your paper, I cannot take out of my head that somehow Compressive Sensing already deals with non sparse but "low complexity" signals through the use of judicious dictionaries in the reconstruction stage. Isn't the dictionary some sort of low complexity vectors? What do you guys think?


If you mean that signals that are not necessarily themselves sparse may be sparse given that one picks the right basis/dictionary (like DCT/wavelets for images, for example), then I think the clear contrast between that baseline setup and us is that in the baseline setup you would need to know the basis/dictionary a priori, e.g. before you attempt to recover the signal.

Our approach is fundamentally different because it does not use sparsity as a measure of complexity. Rather, it is assumed that the signal is drawn from an ergodic source, and the source statistics and signal (realization of the source) are jointly estimated iteratively - a given estimate of the signal allows us to build an estimate of the source statistics, and then the statistics allow us to further refine the signal estimate. Such a process can be repeated to satisfaction.
The equivalent in sparsity-based CS would be to try to estimate the signal and the basis/dictionary that yields sparsity simultaneously; this approach has been considered and addressed (so far) only when you have a very large number of sufficiently diverse set of signals under observation that are sparse in the basis/dictionary. These approaches then integrate dictionary learning with sparse recovery.

Then, isn't this dictionary learning, A = DX find D dictionary and a sparse X, how is your approach changing that problematic? 
 Marco and Dror:

We're different because we don't use sparsity as a measure of complexity; we use the notion of Kolmogorov complexity instead, which is approximated in our algorithm by empirical entropy. Standard CS aims for the sparsest signal representation that agrees with the measurements; we aim for the signal with simplest source statistics (lowest empirical entropy) that agrees with the measurements. As a byproduct of optimizing for minimum complexity, we estimate the source statistics. Therefore, there is no need to assume knowledge of the signal prior as in Bayesian methods; at the same time, our reconstruction performance is as good as (superior) Bayesian reconstruction for any signal despite not knowing its statistics. Keeping in mind that any type of sparsity examined so far in the literature relies on an independent and identically distributed (i.i.d.) signal assumption, our algorithm should reconstruct as well as any existing sparsity-based solvers.
In contrast, you could think of dictionary learning as estimating a sparse representation for each signal observed and getting the dictionary as a byproduct.
The way I understand dictionary learning, using training data they try to find D such that x=D*c is represented with c being as "sparse" as possible over the ensemble of training data, where sparsity is measured by energy compaction or some other metric.
The *limitation* of current dictionary learning methods that we're aware of is that the metric used to induce sparsity implies that the entries of c are i.i.d. An i.i.d. assumption might be plausible in some scenarios, but less so in others. For example, the example we used in Figure 1 in our paper shows strong statistical dependencies between samples of the unknown input x, but to the best of our knowledge x cannot be represented as sparse in any fixed dictionary D. Here we have a type of signal who's statistical structure is not covered by the prior art.
Keep also in mind that dictionaries could be used within our setting, and we could try to estimate the Kolmogorov complexity of the vector c. It would require some changes to the algorithm, but in principle this can be done.
OK, thanks. Now, I am also trying to draw a mental picture of how successive improvements in compressive sensing have led better empirical bounds on the number of measurements required for reconstruction over time. 
Initially, with solvers like least squares and no assumption on the measurement ensemble (measurement matrices), our world was basically divided into underdetermined and overdetermined linear systems of equations with an over emphasis toward exactly square systems (at least in many engineering fields such as mechanical, fluid, nuclear,.....). 
With compressive sensing, making an assumption on the solution (sparsity) and some assumption on the measurement ensembles (Gaussian, Bernoulli,...) one can find some solution in polynomial time in the underdetermined territory. 
Specialists, however, have a better domain knowledge of their respective fields and know that their signals are not just sparse. Structured sparsity, model based compressive sensing or low information solutions (in your case) seem to be newer assumptions to be used in underdetermined systems.Given all this, do we have a clear view on the respective measurement ensembles that allow polynomial solutions of these underdetermined systems with these new constraints and solvers?
In our case, virtually any ensemble of random matrices (except those with low entropy) should work with high probability in the limit of large matrices. Our proofs are for Gaussian matrices. Our algorithm doesn't really make any assumptions.

This is a very interesting common thread between the examples you give. The use of least squares is not really assuming anything about the matrix, but rather only about the signal (that it has least energy among all possible inputs that match the output). The compressed sensing efforts that you describe after that have posed conditions on both the matrix and the signal model, and most (if not all) of these conditions have been reduced to random matrix designs, even though the way we arrive at randomness might be very different in each case. It is certainly very different for our universal approach and for the RIP-based approach, for example.

Thanks Dror and Marco, I think I am less dumb as a result of this discussion.

Image Credit: NASA/JPL/Space Science Institute. N00177178.jpg was taken on October 01, 2011 and received on Earth October 02, 2011. The camera was pointing toward ENCELADUS at approximately 107,084 kilometers away, and the image was taken using the BL1 and CL2 filters. This image has not been validated or calibrated. A validated/calibrated image will be archived with the NASA Planetary Data System in 2012.

No comments: