** If you've read this entry before, please check for

~~an~~ two updates at the end of it **

Today we have two insights for the price of one. You probably recall

last Sunday Morning Insight [1] about the ability to potentially strengthen belief propagation solvers thanks to the inclusion of long range potential. I really don't know if this idea is any good but

Lenka Zdeborova sent this additional insight:

....About your Sunday morning post. I enjoyed reading it. I would like to point out that the TAP equations work for infinite-range spin glass, more precisely they were first written for the Sherrington-Kirkpatrick (SK) model where there are two-body random sign interactions between every pair of spins. A lot is known about this model. However, the physics of the real 3D spin glass with short range interactions is crucially different. There has been some work in physics for the past 30 years that show how different and how similar the short-range spin glasses are to the SK model, and the questions is far from being settled.

So my answer to your first question would be: If there are only long range interactions then the answer is yes, the TAP, cavity and replica methods were created precisely for infinite-range systems. If there is a mix of the two or only short range interactions then the answer is not fully known I think. But there are a lot of methods out there in theoretical physics to connect the behaviour of the infinite-range systems to the real 3D ones and I always found an amazingly interesting question if these methods will find some nice algorithmic applications.

Best!

We had a small discussion afterwards to which I eventually stated

*Maybe using a long range potential might make the attendant solver robust, at least more robust than it currently is.*

Little did I know that

Dror Baron was about to send me the following:

Hey Igor, I'm sure that you have already seen our new paper, which we posted on arxiv. The point is that there are different performance regions in compressed sensing, the mean square error does not behave in a smooth way, rather it has discontinuities. Additionally, belief propagation is suboptimal. Therefore, there is room for a new generation of better compressed sensing solvers in the future.

In this paper, compressed sensing with noisy measurements is addressed. The theoretically optimal reconstruction error is studied by evaluating Tanaka's equation. The main contribution is to show that in several regions, which have different measurement rates and noise levels, the reconstruction error behaves differently. This paper also evaluates the performance of the belief propagation (BP) signal reconstruction method in the regions discovered. When the measurement rate and the noise level lie in a certain region, BP is suboptimal with respect to Tanaka's equation, and it may be possible to develop reconstruction algorithms with lower error in that region.

So we now have some good reasons to believe that AMP/Belief Propagation solvers ought to be improved. One of the reason we have had a clear perspective for better algorithms in compressive sensing, has mainly been because of phase transition graphs [7] not RIP or similar admissibility conditions. Hence,

Junan and

Dror provide us a clear roadmap for improvement. Maybe one of the way to do so is by looking at different (long range) potentials and then if that happens, what sorts of seeded matrices [3] does that imply ? etc....

Which components of the singular value decomposition of a signal-plus-noise data matrix are most informative for the inferential task of detecting or estimating an embedded low-rank signal matrix? Principal component analysis ascribes greater importance to the components that capture the greatest variation, i.e., the singular vectors associated with the largest singular values. This choice is often justified by invoking the Eckart-Young theorem even though that work addresses the problem of how to best represent a signal-plus-noise matrix using a low-rank approximation and not how to best_infer_ the underlying low-rank signal component.

Here we take a first-principles approach in which we start with a signal-plus-noise data matrix and show how the spectrum of the noise-only component governs whether the principal or the middle components of the singular value decomposition of the data matrix will be the informative components for inference. Simply put, if the noise spectrum is supported on a connected interval, in a sense we make precise, then the use of the principal components is justified. When the noise spectrum is supported on multiple intervals, then the middle components might be more informative than the principal components.

The end result is a proper justification of the use of principal components in the setting where the noise matrix is i.i.d. Gaussian and the identification of scenarios, generically involving heterogeneous noise models such as mixtures of Gaussians, where the middle components might be more informative than the principal components so that they may be exploited to extract additional processing gain. Our results show how the blind use of principal components can lead to suboptimal or even faulty inference because of phase transitions that separate a regime where the principal components are informative from a regime where they are uninformative.

Raj makes the point that by looking at the gap of the spectrum of data matrices, one can spot when the noise is supported on connected intervals.

How is this relevant ?

Quite simply, it answers at least one question: when one should employ

Robust PCA [6] as opposed to simple PCA ? When you have images or movies, the task is kind of simple [8], if you are not that lucky, you probably have to resort to some domain specific expertise.

Raj seems to suggest that first computing the SVD of the data matrix and finding those gaps in the spectrum is an indication that you ought to be looking for a Robust PCA solution. In a new paper,

Tianyi Zhou seems to suggest that robust PCA might only be 20 times as long as a single PCA, but sometimes knowing how to switch from one to the other automatically might save time. I also wonder how this result is changes when using a randomized version of the PCA in the context of very large problems.

Another way to look at it is from a totally different standpoint, it might be a way for detecting the changing behavior of a target through a multiscattering medium, etc...

- Update 1: I received the following from

Lenka

*Dear Igor,*

*At this late hour I made it to catch up with Nuit Blanche. I would like to point out that the result of the paper by Junan Zhu and Dror Baron were already included in our paper Probabilistic Reconstruction in Compressed Sensing: Algorithms, Phase Diagrams, and Threshold Achieving Matrices, Section V.C, and we indeed showed that also in that case the seeded matrices serve to go close to optimality, Section VI.C. The noisy case is included in the Donoho, Javanmard, Montanari proof of optimality with seeding/spatial coupling. I actually just wrote about this to Junan and Dror before reading today's Nuit Blanche.*

*
*
*Best!*

*
*