Sunday, November 20, 2011

A Post Peer Review of SL0

In The Long Tail of Post-Peer-Review Publishing: Reproducible Research as a Side Effect, some of you felt I was overemphasizing the SL0 solver. Maybe so, but your input on the matter are far more insightful than what we currently have available to make a judgement on: What to use and not to use for sparse recovery:

Bob Sturm makes the following comment:

Hi Igor. Note that SL0 does not perform very well for sparse signals distributed Rademacher. In that case, all approaches are beat hands down by l1 minimization and AMP ... which is curious since SL0 is taking a majorization of the l0 norm. I wonder what would happen if we did "SL1"?

An Anonymous reader tells us that:

SL0 is too sensitive to noise. This limits its applications to many fields, like gene expression, network mining, sparse representation of natural signals (biosignals, etc), DOA estimation, ....

We should not over-praise algorithms behaving better in noiseless scenarios. After all, sparsity based signal processing/machine learning covers many fields. And over-praising such an algorithm may mislead your readers with various background/applications.

Finally, Arian Maleki sent a me a thoughtful note, here it is (after permission as usual):
Hi Igor,

I read some of the posts on your very nice weblog today and I enjoyed them. Since you have posted some notes regarding the phase transitions, I just wanted to let you know some of the misunderstandings I have seen in a few papers regarding outperforming the Donoho-Tanner phase transition.

1) The main feature of the phase transition of $\ell_1$ is that it does not depend on the distribution of the coefficients. In other words, if we use the model $y = Ax_o$, then the phase transition of $\ell_1$ is independent of the distribution of $x_{o,i}$. This property does not hold for most of the other algorithms in the literature (except the ones that are related to ell_1 such as AMP) and that's why we introduced the concept of least favorable distribution in the "optimally tuned iterative algorithms" paper. "least favorable distribution" in that paper is the distribution that results in the worst performance of the algorithm. In that paper we did many simulations to characterize the least favorable distributions for several algorithms introduced there.

2) It is easy to beat the phase transition of ell_1 if we consider specific forms of distributions. For instance, if the coefficient distribution is Gaussian, IHT or re-weighted \ell_1 will outperform ell_1. In fact, if such information is available we may use it more efficiently using the Bayesian approaches. Therefore in my opinion the challenge is to come up with an algorithm that either theoretically beats the phase transition of $\ell_1$ over all the input distributions or at least empirically beats $ell_1$ over a range of distributions and not just for two or three "similar" distributions. In my opinion again, this is "beating Donoho-Tanner phase transition".

3) Using extra structures present in the signal we can again improve the phase transition of $\ell_1$ minimization. For instance, if the signal is group-sparse such as complex signals, the phase transition improves. You may for example see this phenomena in Fig.1 of But again, this can not be considered as beating the Donoho-Tanner phase transition. It is rather a generalization of that result.

On your last post regarding the review process, I understand your point. However, the figure shown there and the claims made about that figure seem a little bit unfair. Since, the distribution used there is something that favors greedy approaches and I believe if we change the distribution we will see that ell_1 still outperforms most of the proposals.

hope this helps,
So in a matter of a few communications that I am making public, we have a better sense of what SL0 can and cannot do. In summary, we have:

  1. SL0 does not perform very well for sparse signals distributed Rademacher.
  2. In absolute term beating the Donoho-Tanner would mean to include most signal distribution (see 1)
  3. Structured sparsity should not be considered when looking at DT
  4. SL0 is sensitive to noise.

Many people could make a decision based on these four criteria. Let me make the revolutionary claim that a large segment of the readership would be OK with using SL0 in the future as a result of these very insightful comments. Let me further make a second claim that it is likely that the SL0 paper did not get rejected based on any of the deficiencies highlighted above. For one thing, the great equalizer known as the Donoho-Tanner phase transition did not exist. Welcome to the post peer review world! Stay tuned for the next entry.

1 comment:

Anonymous said...

Regarding sensitivity of SL0 to noise, what about Robust SL0 (, it is less sensitive to noise compared to the original SL0?