Tuesday, September 27, 2011

Build it and people will come.


In A stunning development in breaking the Donoho-Tanner phase transition ? some anonymous commenter made the following statement:

There seems to be a little bit of confusion here...
When the signals are exactly sparse and there is no noise, the Donoho-Tanner phase transition characterizes the limits of L1 minimization. As it happens that L1 minimization outperforms other reconstruction algorithms (such as SP, CoSaMP, IHT), this phase transition also seems to apply to these algorithms. However, it will not apply to some other algorithms that outperform L1 minimization, like iteratively reweighted L1.
We also do know what the limit of compressive sensing is in this noise-free setting. It is 2*s measurements, where s is the number of non-zero coefficients in the signal (note this is independent of signal dimension). Furthermore, there are polynomial time algorithms coming from coding theory that achieve this limit (although they are not stable in the presence of noise). This limit can be loosened to s+1 measurements if reconstruction with probability 1 is acceptable (hence the red line in their phase-transition figures).
From what I can tell after skimming through this paper, the authors also follow a coding theory approach: They design their own measurement matrices and reconstruction algorithms for these matrices , and outperform L1 minimization. It seems all their results are on the exact sparse signals with no measurement noise, and they mention that the stability of the algorithm will be studied in further work.
Y'all are nitpicking, indeed as Bob Sturm recently showed us in his recent comparison between different reconstruction solvers such as AMP, SL0, BP, etc ....in Phase transitions on Hawai Time (by the way I am patiently waiting with a dry martini on the beach to see some computations showing IRL1 performing better than SL0), anyways Bob showed the following graph:



The real Donoho-Tanner phase transition ( for L_1 solvers) is designated as "B" on this graph. In other words, we already know that SL0, AMP do better than DT,  but not by a significant margin. In particular, this is the reason, I did not use the word 'stunning' last year when I featured Shrinivas Kudekar and Henry Pfister's work ( The Effect of Spatial Coupling on Compressive Sensing)



I realize there are issues with the stability of coding theory algorithms with noise but these issues don't seem to be so hard to beat either. Take for instance the  recent poster of Jeremy Vila and Phil Schniter at SAHD (`An Empirical-Bayes Approach to Compressive Sensing via Approximate Message Passing,) where they also use an AMP algorithm and try to delimit the noisy phase transition


Here they seem to be doing better than DT with noise. Hence I would not be overly surprised if the algorithm of Florent Krzakala, Marc Mézard, François Sausset,Yifan Sun and Lenka Zdeborová is robust with noise, I would evidently expect some degradation.

The other argument on the specific measurement matrix is weak in my view. We have already seen instances of compressive sensing with measurements matrices and specifically designed reconstruction algorithms. As the noise issue is addressed, the specific measurement matrices will be an issue with the hardware designers, but we have seen this before, two examples come to my mind,


and those hardware designers will manage provided the phase transition is larger than that given by DT.

While writing this entry, I also asked Henry Pfister his point of view on this new algoriithm. Henry kindly responded with:

It is well-known that the MAP decoder requires only p*N measurements for a random Gaussian measurement matrix of a "single-Gaussian" signal model where each signal component is drawn i.i.d. according to

f(x) = (1-p)*\delta_0 (x) + p * \phi_1 (x).

Here, we denote the pdf of zero-mean Gaussian with standard deviation s by

\phi_s (x) = exp(-x2 / (2*s2) ) / sqrt(2*pi*s2).


The trick is finding a low-complexity decoder which achieves this. In that respect, the work in [KMSSZ] is very interesting.

One portion of our paper analyzed verification-based (a.k.a. SBB) decoding with a signal model equivalent to the single-Gaussian model and clearly showed gains are possible with spatially-coupled measurement matrices. We are now considering the question of why we did not observe larger gains in our CS-BP experiment. Other than the possibility of erroneous results, three important differences are clear:

(1) In our spatial-coupling paper, the CS-BP test used signals whose components were drawn i.i.d. from the distribution

f(x) = (1-p)*\phi_{0.5} (x) + p*\phi_{10} (x).


For this two-Gaussian model, the MAP threshold is strictly worse than the \alpha = \rho line. In fact, one can show  that the slope at \rho=0 must be infinite for this case (like the L1 and BP curves in [KMSSZ]).

(2) Our measurement ensemble was very sparse (e.g., each signal component is involved in 5-9 measurements). In their paper, the measurement ensemble is dense as N->Infinity for fixed L.  In their simulations for M=N/3, each signal component is involved in 250-1000 measurements because they use 3 measurement blocks and L ranges between 10 and 40. They also optimize the variance of entries in their measurement matrix.

(3) The last obvious difference is the BP decoder. We used a function-passing BP decoder based on quantized functions. They used a mean-variance passing BP decoder based on the Donoho-Montanari Gaussian approximation of BP.

It would be very interesting to determine the contribution that each of these three changes made to the impressive results they show.

Finally, the name "seeded BP" of their decoder is somewhat of a misnomer because their main innovation is a particularly effective flavor of spatially-coupling for measurement matrices for AMP-like decoding.
---
Best Regards,
Henry
 So, what happened on Friday ?




The sudden realization that the line in the sand separating P and NP had shifted. There is now a polynomial algorithm that can decode the sparsest signals of an underdetermined linear system of equations in a region that was previously thought to require combinatorial search.
For that matter, I seem to recall that Emmanuel Candes initially reconstructed a perfectly sparse Shepp- Logan phantom with an L1 solver. The line on the sand moved when he did that back in 2003, it moved again on Friday.

A final insight was provided by another anonymous reader as to why this algorithm was working::
It seems that their main contribution is a particular sensing matrix that allows a few sparse elements to be detected, and then propagates the knowledge of these elements forward to detect the other ones. while this is certainly interesting, I wonder if the physics of many sensing applications will actually allow this.
This initial detection of a few elements that then provide the capability for detecting the others is intriguing and reminds me of structured sparsity. However, as I stated earlier, I completely disagree with anonymous on her/his last point (would the hardware designer come if the measurement matrix is so special), actually I have only six words heard many times in Aggieland

Build it and people will come



Thanks Henry and Bob and the two anonymous commenters for providing these insights.


Liked this entry ? subscribe to the Nuit Blanche feed, there's more where that came from

2 comments:

Shrinivas Kudekar said...

Dear Igor,

I am not sure which comment from anonymous you disagree. But let me make a point that the entire innovation, as Henry pointed out, is in the creation of this special measurement matrix. This special construction, called as coupled measurement matrix, allows larger measurements at the boundary. This allows to recover the signal at the boundary. Then the special coupling structure takes over and the whole effect is propagated like a domino effect.

The coupling can be understood by taking multiple copies of the system and then joining them together (almost any random connections work!) in some local window (this window is typically smaller than the number of copies you take). However since we have finite number of copies, there is a termination at the boundary. Surprisingly, it is this termination (high measurements at the boundary in CS or low rate code in coding theory) that propagates through the entire structure. It is like providing small information at the boundary (genie tells you the exact information at boundary) and this effect moves in.

In the context of coding theory, this special structure elevates the BP threshold (low complexity algorithm) of the coupled codes to the MAP threshold (best possible threshold) of the "underlying" uncoupled system. And to complete the story, the MAP threshold can be made arbitrarily close to the ultimate Shannon threshold by increasing the degrees of the graphical code. Thus achieving the capacity (till now the proof is only for the erasure channel but the phenomena is observed to be true for any general (binary-input) channel.)

In fact this coupling phenomena is now observed in many problems in communication science, computer science and statistical physics. One paper where some results are compiled is

http://arxiv.org/abs/1105.0785

Also a nice compilation of recent results on Spatial Coupling and Coding and Communications can be found on Prof. Urbanke's webpage:

http://ipg.epfl.ch/doku.php?id=en:publications:scc

best

Igor said...

Howdy Shrinivas,

My point with anonymous is that the folks designing the revolutionary algos should not care so much as to whether hardware designers will come and build an instance of these matrices. It should not be their concerns, it's really the hardware folks concerns. Specifically if the algo is awesome, people will come.

Igor.

Printfriendly