Monday, October 07, 2013

About t'em breakthroughs and a comment on a comment on "The Mathematical Shape of Things to Come"

Last week, I mentioned three papers that you had to read because in my mind, they were qualitatively changing something. I received two pointed comments in the comment section: The first commeter stated:

Wasn't the equivalence of AMP and LASSO proved in this 2010 paper:
Bayati and Montanari, "LASSO Risk of Gaussian Matrices," IEEE Transactions on Information Theory
and the second commenter:
I agree with the previous anonymous comment. In fact, I believe that any careful reader of the 2009 PNAS paper by Donoho et al. expected this sort of result.

Indeed both commenters are absolutely right in that these two papers [1,2] are indeed previous breakthroughs. What I said last week was not that this paper was a breakthrough at the expense of these two (or other) papers rather:that if you look at the history of the different sparse signal recovery solvers then there is a myriad of algorithms and each implementation has, most of the time, several parameters to choose from. I believe it is important to say when AMP, one of the least complex and most promising algorithm in town, is found to have a theoretically derived rule of thumb for what I would call parameter policy. When that happens, AMP becomes far more important to a relatively larger community of people using LASSO as it has clear, theoretically sound, implementation details. I'd love to hear if that narrative is not accurate. Again I am not pitting breakthroughs against breakthroughs, I thought it was important to point those three findings that day. Again, do let me know by email or anonymous comments if I am missing something.

Now onto a totally different subject. I am officially pissed off at a comment made on the internet.

This past week-end, Suresh's Google+ feed pointed me to an article on The Mathematical Shape of Things to Come by Jennifer Ouellette which, not doubt, is in support of the extraordinary meetings currently taking place this semester at the Simons Institute ( check Videos: Succinct Data Representations and Applications, Videos: Big Data Boot Camp Day 2,3 and 4, Simons Institute, Berkeley, Videos: Big Data Boot Camp Day 1, Simons Institute, Berkeley ). Why am I pissed off ? Mostly because of the second comment of this article. In summary it says that science writer cannot do a good job if they are illiterate. I am of a different opinion.  Here is why: If you take for example one of the paper [3] that is directly linked from inside the article you will see this figure (which is familiar for the casual reader of this blog)

What does this represent ? some will tell you it's a phase transition showing the limit between what is computable or what is not computable. I see something else, I see two authors charting the waters by drawing a map of something that is currently unknown. I see applied mathematicians draining waters between two islands of knowledge [4,5,6,7]. The article is quite simply a reflection of the first tales recounted by these explorers. Some explorers will oversell the findings, others will draw maps. In both cases, the scribes are prisoners of the tales and/or their own preconceived notions. There is nothing wrong with that. But much like Christopher Columbus, while the map is exciting,  your and the scribe's excitement has probably very little to do with the eventual use of that map. We also expect that other explorers will draw new maps [8,9].

Only the passage of time will bring the perspective that gives the 20/20 hindsight that is the territory of historians of science. It is unfair and misguided to expect a piece written as we discover these new territories, to provide an adequate insight as to why they are important. At the very least, the article is there to get people's attention so that they can decide what these new maps mean to them. For instance, I tried to provide a personal insight in the comment section. You'll notice that I personally believe it will have an impact in an area that is far from applied mathematics and while it was a quick write up, I am not sure we can say more than this. Remember, I am supposed to have read a large part of that literature and more importantly I know some calculus (though I will concede major weaknesses in checking the proofs given in some of the bounds uncovered by some concentration of measure results).

Dear Jennifer and Erica, 
I write a small blog on compressive sensing [1]. I think one of the insight that I have not seen much in your piece is really that this is probably the first time in the history of science that non trivial mathematics (concentration of measure is **not** on most graduate engineering courses) and applied math have such a direct bearing on the design of sensors. It’s David Donoho [5] reportedly exclaiming a panel of NSF folks “You’ve got Terry Tao (a Fields medalist [6]) talking to geoscientists, what do you want?” [2], it’s folks like Anna Gilbert [3] and collaborators potentially changing the way we do “holistic measurements” as you call them in microarray experiments and the list goes on and is long.
Current sensors may or not benefit from compressive sensing but we now have a better understanding of why. You have to recall that the world of sensors has always viewed math as a back end process. The role reversal comes from deep applied mathematical results such as what I personally call the Donoho-Tanner phase transition [4] that Emmanuel Candes and Ben Recht are exploring for larger dimensional objects such as matrices. More interestingly, the approach has led a number of research groups all over the world in trying new concepts of sensors. A good majority of these concepts will die, some will remain within a niche market and a few will quite simply change our world.

(Submitted on 16 Aug 2010 (v1), last revised 9 Aug 2012 (this version, v2))
We consider the problem of learning a coefficient vector x_0\in R^N from noisy linear observation y=Ax_0+w \in R^n. In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator x'. In this case, a popular approach consists in solving an L1-penalized least squares problem known as the LASSO or Basis Pursuit DeNoising (BPDN). For sequences of matrices A of increasing dimensions, with independent gaussian entries, we prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit expression for this limit. Our result is the first rigorous derivation of an explicit formula for the asymptotic mean square error of the LASSO for random instances. The proof technique is based on the analysis of AMP, a recently developed efficient algorithm, that is inspired from graphical models ideas. Simulations on real data matrices suggest that our results can be relevant in a broad array of practical applications.

[2] Message-passing algorithms for compressed sensing, David L. Donoho, Arian Maleki, and Andrea Montanari,
Compressed sensing aims to undersample certain high-dimensional signals yet accurately reconstruct them by exploiting signal characteristics. Accurate reconstruction is possible when the objectto be recovered is sufficiently sparse in a known basis. Currently, the best known sparsity–undersampling tradeoff is achievedwhen reconstructing by convex optimization, which is expensivein important large-scale applications. Fast iterative thresholdingalgorithms have been intensively studied as alternatives to convex optimization for large-scale problems. Unfortunately knownfast algorithms offer substantially worse sparsity–undersampling tradeoffs than convex optimization. We introduce a simple costless modification to iterative thresholding making the sparsity–undersampling tradeoff of the new algorithms equivalent to thatof the corresponding convex optimization procedures. The newiterative-thresholding algorithms are inspired by belief propagation in graphical models. Our empirical measurements of thesparsity–undersampling tradeoff for the new algorithms agreewith theoretical calculations. We show that a state evolution formalism correctly derives the true sparsity–undersampling tradeoff. There is a surprising agreement between earlier calculations based on random convex polytopes and this apparently very different theoretical formalism.

[3] Exact Matrix Completion via Convex Optimization, Emmanuel J. Cand es and Benjamin Recht]
[4] How to find real-world applications for compressive sensing
[5] A Mathematician Becomes a Pirate.
[6] What Island Is Next ?
[7] Islands of Knowledge

No comments: