Pages

General Matching Pursuit (GMP): Is Matching Pursuit Solving Convex Problems? - implementation -




Is Matching Pursuit Solving Convex Problems? by Mingkui Tan, Ivor W. Tsang, Li Wang. The abstract reads:
Sparse recovery ({\tt SR}) has emerged as a very powerful tool for signal processing, data mining and pattern recognition. To solve {\tt SR}, many efficient matching pursuit (\texttt{MP}) algorithms have been proposed. However, it is still not clear whether {\tt SR} can be formulated as a convex problem that is solvable using \texttt{MP} algorithms. To answer this, in this paper, a novel convex relaxation model is presented, which is solved by a general matching pursuit (\texttt{GMP}) algorithm under the convex programming framework. {\tt GMP} has several advantages over existing methods. At first, it solves a convex problem and guarantees to converge to a optimum. In addition, with $\ell_1$-regularization, it can recover any $k$-sparse signals if the restricted isometry constant $\sigma_k\leq 0.307-\nu$, where $\nu$ can be arbitrarily close to 0. Finally, when dealing with a batch of signals, the computation burden can be much reduced using a batch-mode \texttt{GMP}. Comprehensive numerical experiments show that \texttt{GMP} achieves better performance than other methods in terms of sparse recovery ability and efficiency. We also apply \texttt{GMP} to face recognition tasks on two well-known face databases, namely, \emph{{Extended using}} and \emph{AR}. Experimental results demonstrate that {\tt GMP} can achieve better recognition performance than the considered state-of-the-art methods within acceptable time. {Particularly, the batch-mode {\tt GMP} can be up to 500 times faster than the considered $\ell_1$ methods.}

Here is what Ivor states on his webpage:

A General Matching Pursuit (GMP) is proposed for very large-scale sparse recovery problems (e.g. 100k dictionary entries)
It can recover any k-sparse signals if the restricted isometry constant sigma_k < 0.307-nu, where nu can be arbitrarily close to 0 
When dealing with a batch of signals, the computation burden can be much reduced using a batch-mode matching pursuit, which can be up to 500 times faster than L1-methods 
Its decoding speed is much faster than many recently developed methods including PGH, OMPRA, FISTA, ADM, SP, CoSaMP, AIHT, OMP, Shotgun (Parallel) 
The decoding speed of GMP on face recognition is even as fast as L2-methods but it can achieve more robust and better recognition performances than L2-methods

An implementation of GMP is available here.


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Rambus



Many things seem to be happening at Rambus, a company that is mostly known for fabricating memory chips. First we heard of Patrick Gill joining (and seeking some intern this summer). Then a reader pointed me to this earlier announcement ( Rambus to Present at the Computational Sensing and Imaging Conference) that seems to be in line with their earlier hiring of David Stork and the new announcement of their Binary Pixel Imager Technology. Those signals have been picked up by some investors. It's always about the CMOS baby!


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Reviews and abstracts of FFT2013, CAP10, ISSC2013, BASP2013, ITA2013

Here are few reviews, videos and abstracts lists of several meetings of interest to the community, enjoy:



And then there is this: ITA 2013 - The Tonight Show with Sergio Verdú








Wednesday, February 27, 2013

Robust Near-Separable Nonnegative Matrix Factorization Using Linear Optimization - implementation -


Nicolas Gillis just sent me the following:

Cher Igor, 
I wanted to let you know of our new LP model for near-separable NMF, see http://arxiv.org/abs/1302.4385
The near-separable NMF problem can be stated as follows. We are given a nonnegative data matrix M = WH+N where W,H >= 0 is an NMF and N is some noise, and we assume that each column of the matrix W is equal to some column of the matrix M. The goal is to identifying the columns of W among the columns of M (up to the noise level). It is possible to solve this problem in polynomial time given that the noise level is sufficiently small; see Section 5 of Arora, Ge, Kannan, and Moitra, "Computing a nonnegative matrix factorization -- provably", STOC '12, pp.145--162, 2012. Note that the assumption on M is rather strong but is reasonnable for some applications, e.g., hyperspectral unmixing and document classification. 

Our new LP model is an extension of Hottopixx by Bittorf, Recht, Ré and Tropp (NIPS 2011), cf. your post http://nuit-blanche.blogspot.com/2012/12/hott-topixx-factoring-nonnegative.html. This new LP model does not require normalization of the input matrix and detects the factorization rank automatically. Moreover, it is more flexible, significantly more tolerant to noise, and can easily be adapted to handle outliers and other noise models. A CPLEX code (along with the code of other near-separable NMF algorithms, including a CPLEX implementation of Hottopixx) is available on https://sites.google.com/site/nicolasgillis/code

Best regards, Nicolas. 
PS. Some updates for the "The Matrix Factorization Jungle": 
- The final version of "Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing" is freely available from the journal website: http://jmlr.csail.mit.edu/papers/v13/gillis12a.html
- There is a new version of "Robustness Analysis of HottTopixx, a Linear Programming Model for Factoring Nonnegative Matrices" at http://arxiv.org/abs/1211.6687 (Version 3) --There was an error in Lemma 2 but this only changes some constants here and there.
Thanks Nicolas !, the Advanced Matrix Factorization Jungle Page has been updated.

NMF is so central in much of what we do in engineering and science, it is nice to see some reasons as to why it works so well. Remember it took us twelve year before getting comfortable mathematically speaking. This reminds me of the typical island of knowledge situation where an ecosystem thrives yet has very little connection to the outside body of knowledge. Work like this one are dredging the water between this island and the continental plate  The interesting thing here is that these separability and near-separability condition may have a direct impact on subjects like calibration issues and more.


Nonnegative matrix factorization (NMF) has been shown recently to be tractable under the separability assumption, under which all the columns of the input data matrix belong to the convex cone generated by only a few of these columns. Bittorf, Recht, R\'e and Tropp (`Factoring nonnegative matrices with linear programs', NIPS 2012) proposed a linear programming (LP) model, referred to as Hottopixx, which is robust under any small perturbation of the input matrix. However, Hottopixx has two important drawbacks: (i) the input matrix has to be normalized, and (ii) the factorization rank has to be known in advance. In this paper, we generalize Hottopixx in order to resolve these two drawbacks, that is, we propose a new LP model which does not require normalization and detects the factorization rank automatically. Moreover, the new LP model is more flexible, significantly more tolerant to noise, and can easily be adapted to handle outliers and other noise models. Finally, we show on several synthetic datasets that it outperforms Hottopixx while competing favorably with two state-of-the-art methods.


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

6 PostDocs at Edinburgh

Mike Davies just sent me the following:

Hi Igor
Could you publisice the following advert for RA positions at Edinburgh.
Many thanks
Mike Davies
Here it is:
6 Positions in "Signal Processing for the Networked Battle Space" at Edinburgh
The Joint Reserach Institute in Signal and Image Processing at Edinburgh (University of Edinburgh and Heriot-Watt University) are recruiting for a new big research initiative in signal procesing for defence. We currently have 6 post doctoral Research Fellow positions available for 3 years.

The project aims to develop new technologies to tackle the major challenges of persistent wide area surveillance.
The research will encompass aspects of: compressed sensing, distributed sensor management, distributed signal processing, multi-target tracking using finite set statistics, advanced detection localization and classification, anomaly detection and behaviour monitoring, and efficient computation and the mapping of algorithms to hardware. The project will work in close partnership with industry and the UK defence sector.
Interested researchers can apply though:
or
The deadline for applications is 22nd March 2013



Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

The Stanford 256x256 CMOS Image Sensor with ΔΣ-Based Single-Shot Compressed Sensing

Like everybody else. I love a good PR when I see one. It first showed up on my radar screen with Vladimir's blog then on my Google alert. IEEE Spectrum is featuring a blurb on Camera Chip Makes Already-Compressed Images, I had mentioned before and found the conference paper.Here it is A 256x256 CMOS Image Sensor with ΔΣ-Based Single-Shot Compressed Sensing by Y. Oike and A. El Gamal, the abstract reads:

A CMOS image sensor architecture with built-in single-shot compressed sensing is described. The image sensor employs a conventional 4-T pixel and per-column ΣΔ ADCs. The compressed sensing measurements are obtained via a column multiplexer that sequentially applies randomly selected pixel values to the input of each ΣΔ modulator. At the end of readout, each ADC outputs a quantized value of the average of the pixel values applied to its input. The image is recovered from the random linear measurements off-chip using numerical optimization algorithms. To demonstrate this architecture, a 256x256 pixel CMOS image sensor is fabricated in 0.15 μm CIS process. The sensor can operate in compressed sensing mode with compression ratio 1/4, 1/8, or 1/16 at 480, 960, or 1920 fps, respectively, or in normal capture mode with no compressed sensing at a maximum frame rate of 120 fps. Measurement results demonstrate capture in compressed sensing mode at roughly the same readout noise of 351 μVrms and power consumption of 96.2 mW of normal capture at 120 fps. This performance is achieved with only 1.8% die area overhead. Image reconstruction shows modest quality loss relative to normal capture and significantly higher image quality than downsampling.

Since I hadn't read the actual paper, I wondered how this architecture was different from either the EPFL CMOS architecture or simply the Georgia Tech Transform Imager, an anonymous commenter responded with:

From the paper:
"In [18], [19] (the EPFL implementation), CS is implemented by shifting a random digital pattern representing a measurement matrix via a shift register distributed over the pixel array (see Fig. 1(d)). The currents from the pixels with the pattern in the same column are summed over one line while the currents from the pixels with the pattern are summed over a second column line. The total weighted sum is then performed again in analog at the chip level. This implementation requires multiple shot image capture, is not scalable due to the large pixel size, and suffers from low SNR due to pixel design and analog summation."

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, February 26, 2013

Efficient High-Dimensional Entanglement Imaging with a Compressive-Sensing Double-Pixel Camera

Here is a new preprint using compressive sensing to reduce the scanning time for entanglement imaging. It is a just a question of time before they reduce their scanning even further through the use of the seeded matrices that have been recently fitted to the group testing framework (isn't it funny how that entry makes the connection with ghost computational imaging given today's paper?) or by using any of the AMP solvers like TurboGAMP / GAMP or even SBSL. They went from 310 days down to 8 hours, what's next 1 minutes ? so quantum mechanics type of imaging might actually be useful, that's a whopper :-) Recall many Nobel Prizes were for imaging, let's hope for something absolutely fantastic to come out of this. Here is the paper:



We implement a double-pixel compressive-sensing camera to efficiently characterize, at high resolution, the spatially entangled fields that are produced by spontaneous parametric down-conversion. This technique leverages sparsity in spatial correlations between entangled photons to improve acquisition times over raster scanning by a scaling factor up to n2/log(n) for n-dimensional images. We image at resolutions up to 1024 dimensions per detector and demonstrate a channel capacity of 8.4 bits per photon. By comparing the entangled photons’ classical mutual information in conjugate bases, we violate an entropic Einstein-Podolsky-Rosen separability criterion for all measured resolutions. More broadly, our result indicates that compressive sensing can be especially effective for higher-order measurements on correlated systems.

Popular Summary: Suppose that Alice and Bob share some 1000-sided dice having special properties: When Alice rolls her “fair” die, it is equally likely to land on any of the 1000 sides. However, when Bob rolls his equally “fair” die, he always rolls the same side as Alice, even if they are far apart and roll at the same time. Even though Alice and Bob’s individual outcomes are random, when considered together, they are correlated and always agree. This rather odd behavior is the essence of quantum entanglement and can be replicated by photons when trying to measure their direction of propagation.
Quantum mechanics tells us that a photon is equally likely to be traveling in many different directions, but, upon actually measuring this direction (rolling the dice), only one direction is found. Strangely, measurements of a second photon entangled with the first, also equally likely to propagate in many directions, will always give the same outcome as the first photon. Unfortunately, this behavior is difficult to observe with state-of-the-art detectors. In our paper, we report that, by combining a technique called compressive sensing with standard imaging arrays, we are able to efficiently observe these complex quantum correlations of pairs of photons produced in a nonlinear optical device.
Characterizing high-dimensional entanglement is challenging because the amount of data required grows exponentially with the number of dimensions. Interesting signals within this parameter space also tend to be sparse, so measuring one is like hunting for needles in a haystack with an unknown number of needles. Furthermore, most entanglement measures require a tomographic procedure to recover a wave function or density matrix with complex coefficients that cannot be directly measured. This procedure requires a complicated series of measurements and postprocessing.
To overcome these limitations, we apply ideas from two fields, information theory and compressive sensing. Information theory provides a way for us to characterize an entangled system directly from measurements by means of a quantity called the classical mutual information. A comparison of classical mutual information in the position and momentum bases allows us to show that the correlations are nonclassical. To handle the high dimensionality, we adapt compressive-sensing single-pixel cameras to change sparsity from a problem into a resource. Compressive sensing is a measurement technique that effectively compresses a signal during measurement, greatly reducing the requisite amount of data. In our system, measurement time is reduced from a year to a few hours.


Coded aperture compressive temporal imaging

Here is some hardware today and I can't wait to see this CACTI imager merge with CASSI recently mentioned. For some odd reason, I did not see a reference to either the Reinterpretable Imager of Amit Agrawal, Ashok Veeraraghavan and Ramesh Raskar. or the Coded Strobing Photography: Compressive Sensing of High-speed Periodic Events by Ashok Veeraraghavan ,Dikpal Reddy, and Ramesh Raskar in the reference section (yes, there is the flutter shutter reference though). It would have been nice to see a sentence or two on how the systems are different and the type of phase space they are exploring. Anyway here is the paper: 



Abstract: We use mechanical translation of a coded aperture for code division multiple access compression of video. We present experimental results for reconstruction at 148 frames per coded snapshot.





Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, February 25, 2013

BSSC: Robust Classification using Structured Sparse Representation -implementation -

Ehsan Elhamifar presents the "Structured-Sparse Subspace Classification is an algorithm based on block-sparse representation techniques for classifying multi-subspace data, where the training data in each class lie in a union of subspaces." The attendant paper is: Robust Classification using Structured Sparse Representation by Ehsan ElhamifarRene Vidal. The abstract reads:
In many problems in computer vision, data in multiple classes lie in multiple low-dimensional subspaces of a high-dimensional ambient space. However, most of the existing classification methods do not explicitly take this structure into account. In this paper, we consider the problem of classification in the multi-subspace setting using sparse representation techniques. We exploit the fact that the dictionary of all the training data has a block structure where the training data in each class form few blocks of the dictionary. We cast the classification as a structured sparse recovery problem where our goal is to find a representation of a test example that uses the minimum number of blocks from the dictionary. We formulate this problem using two different classes of non-convex optimization programs. We propose convex relaxations for these two non-convex programs and study conditions under which the relaxations are equivalent to the original problems. In addition, we show that the proposed optimization programs can be modified properly to also deal with corrupted data. To evaluate the proposed algorithms, we consider the problem of automatic face recognition. We show that casting the face recognition problem as a structured sparse recovery problem can improve the results of the state-of-the-art face recognition agorithms, especially when we have relatively small number of training data for each class. In particular, we show that the new class of convex programs can improve the state of the art recognition results by 10% with only 25% of the training data. In addition, we show that the algorithms are robust to occlusion, corruption, and disguise.


see also  Block-Sparse Recovery via Convex Optimization by Ehsan Elhamifar, Rene Vidal

The code is here.


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Nuit Blanche Reader's Reviews



In the spirit of Reddit, Google+, LinkedIn as Peer Review Systems, here are various recent interactions of note (see update at the end of this entry). After publishing Sunday Morning Insight: Phase Transitions and Eigen-gaps as Roadmaps., I received the following from Lenka Zdeborova:

Dear Igor,

At this late hour I made it to catch up with Nuit Blanche. I would like to point out that the result of the paper by Junan Zhu and Dror Baron were already included in our paper Probabilistic Reconstruction in Compressed Sensing: Algorithms, Phase Diagrams, and Threshold Achieving Matrices, Section V.C, and we indeed showed that also in that case the seeded matrices serve to go close to optimality, Section VI.C. The noisy case is included in the Donoho, Javanmard, Montanari proof of optimality with seeding/spatial coupling. I actually just wrote about this to Junan and Dror before reading today's Nuit Blanche.
Best!


Paul Shearer also provided some additional insight on Raj Rao's preprint on the Google + community forum.

Following up on last week's Sunday Morning Insight: Stripe Physics, Wavelets and Compressive Sensing Solvers, Phil Schniter wrote the following email:
Igor,
I want to clarify that there are some problems with the paragraph from [4] that you quoted below.  In particular, the line 
The AMP was generalized for general signal models in [5], [6] and called G-AMP. The algorithm used in [3], [7] is equivalent to G-AMP.
is incorrect.  The "G-AMP" algorithm, as introduced by Rangan in arXiv:1010.5141, is a considerable generalization of the "Bayesian AMP" algorithm proposed earlier by Donoho Maleki and Montanari at ITW 2010 (or [5] above).  G-AMP is distinct from Bayesian AMP because G-AMP handles nonlinear observations such as those in logistic regression, phase-retrieval, and reconstruction from quantized samples.  Also, G-AMP assumes less strict conditions on the homogeneity of the measurement matrix than Donoho Maleki and Montanari's algorithms.  Donoho Maleki and Montanari deserve credit for the Bayesian AMP algorithm, while Rangan deserves credit for the G-AMP algorithm, and the two should not be confused.
Thanks,
Phil


In a different direction, you may have noticed that the paper mentioned in That Netflix RMSE is way too low or is it ? ( Clustering-Based Matrix Factorization - implementation -) has now been removed from ArXiv by the author. We all learned something in the process. For more info check This Week's Guardians of Science: Zeno Gantner and Peyman Milanfar.

In the comment section of some recent blog entries, here are some items of note, you may be interested in continuing those discussions:

It might be time to combine the literature of adaptive neighborhoods with compressive sensing. See:Gordon, R. & R.M. Rangayyan (1984). Feature enhancement of film mammograms using fixed and adaptive neighborhoods. Applied Optics 23(4), 560-564. which kicked it off. Now about 200 papers, but none on compressive sensing.
In A Randomized Parallel Algorithm with Run Time $O(n^2)$ for Solving an $n \times n$ System of Linear Equations -Antti Lipponen commented that:

Hi all. I implemented the Fliege's algorithm for real valued linear systems in C++ for Matlab (based on Fliege's paper and also took some ideas from Riccardo's and Benjamin's code). However, I cannot get the performance even close to Matlab's backslash operator (probably I am using too small matrices or my code is not optimal or...). If someone is interested in my code, it can be found at: http://pastebin.com/XpSaTqPQ
All suggestions and comments are welcome!

In Linear Bandits in High Dimension and Recommendation Systems, Yash Deshpande commented:
Hi Igor,
Just to clarify things a bit: the presentation deals with two essentially independent challenges in recommendation systems: privacy and interactivity. The paper you link deals strictly with the latter. As for the former perhaps the following would be more useful:

The paper detailing the work on the 2011 CAMrA challenge is available here:
Earlier work on matrix completion is detailed in papers with Raghu Keshavan. His thesis contains all the necessary references.

In Sudoku, Compressive Sensing and Thermodynamics, an anonymous commenter said:

There's an earlier paper proposing this idea:

In Tolerance to Ambiguity: Sparsity and the Bayesian Perspective, Mario Figueiredo commented:

Hi,
This is an interesting paper. However, I think that there is something important missing in this discussion: the loss function. Any (Bayesian) point estimate is the minimizer of the posterior expectation of some loss; in particular, the MAP is the minimizer of the posterior expectation of the 0/1 loss (in fact, it is the limit of a family of estimates, but that can be ignored in this discussion). Accordingly, there is no reason whatsoever why the distribution of MAP estimates has to be similar to the prior; why would it? Furthermore, for a MAP estimate to yield "correct results" (whatever "correct" means), there is no reason why typical samples from the prior should look like those "correct results". In fact, the compressed sensing (CS) example in Section 3.3 of the paper illustrates this quite clearly: (a) CS theory guarantees that solving (4) or (5) yields the "correct" solution; (b) as explained in section 3.1, (5) is the MAP estimate of x under a Laplacian prior (and the linear-Gaussian likelihood therein explained); (c) thus, the solution of (5) is the minimizer of the posterior expectation of the 0/1 loss under the likelihood and prior just mentioned; (e) in conclusion, if the underlying vectors are exactly sparse enough (obviously not typical samples of a Laplacian) they can be recovered by computing the MAP estimate under a Laplacian prior, that is, by computing the minimizer of the posterior expectation of the 0/1 loss. This is simply a fact. There is nothing surprising here: the message is that the prior is only half of the story and it doesn't make sense to look at a prior without looking also at the loss function. In (Bayesian) point estimation, a prior is "good", not if it describes well the underlying objects to be estimated, but if used (in combination with the likelihood function and observations) to obtain a minimizer of the posterior expectation of some loss it leads to "good" estimates.
Regards,
Mario Figueiredo.

I think the feedback system in place is already working pretty well....


Update 1:

Thomas Arildsen also mentioned the following in the comments of Reddit, Google+, LinkedIn as Peer Review Systems

Hi Igor, 
Thanks for your many suggestions and initiatives that I think can all help us a bit further in the right direction. Still, we really need some centralised place to organize this. Ideally, an open review and -access platform with the potential to replace a traditional journal. 
It would be great with a journal like PeerJ for signal processing & friends. I am keeping a close eye on the episciences project in hope for this to answer my prayers (episciences.org). 
An overlay to e.g. ArXiv would be a useful solution as well. For example Pierre Vandergheynst has gathered some constructive thoughts on this approach on Google+ recently. 
Maybe the recently appeared PubPeer site could fill this role (pubpeer.com)?
I also noticed PubPeer and it looks like a lightweight but potentially powerful solutioon.




Image Credit: NASA/JPL-Caltech 
This image was taken by Navcam: Right A (NAV_RIGHT_A) onboard NASA's Mars rover Curiosity on Sol 196 (2013-02-23 12:30:02 UTC) . 
Full Resolution


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, February 23, 2013

Sunday Morning Insight: Phase Transitions and Eigen-gaps as Roadmaps.

** If you've read this entry before, please check for an two updates at the end of it **

Today we have two insights for the price of one. You probably recall last Sunday Morning Insight [1] about the ability to potentially strengthen belief propagation solvers thanks to the inclusion of long range potential. I really don't know if this idea is any good but Lenka Zdeborova sent this additional insight:

....About your Sunday morning post. I enjoyed reading it. I would like to point out that the TAP equations work for infinite-range spin glass, more precisely they were first written for the Sherrington-Kirkpatrick (SK) model where there are two-body random sign interactions between every pair of spins. A lot is known about this model. However, the physics of the real 3D spin glass with short range interactions is crucially different. There has been some work in physics for the past 30 years that show how different and how similar the short-range spin glasses are to the SK model, and the questions is far from being settled.

So my answer to your first question would be: If there are only long range interactions then the answer is yes, the TAP, cavity and replica methods were created precisely for infinite-range systems. If there is a mix of the two or only short range interactions then the answer is not fully known I think. But there are a lot of methods out there in theoretical physics to connect the behaviour of the infinite-range systems to the real 3D ones and I always found an amazingly interesting question if these methods will find some nice algorithmic applications.
Best!

We had a small discussion afterwards to which I eventually stated
Maybe using a long range potential might make the attendant solver robust, at least more robust than it currently is.
Little did I know that Dror Baron was about to send me the following:
Hey Igor, I'm sure that you have already seen our new paper, which we posted on arxiv. The point is that there are different performance regions in compressed sensing, the mean square error does not behave in a smooth way, rather it has discontinuities. Additionally, belief propagation is suboptimal. Therefore, there is room for a new generation of better compressed sensing solvers in the future.
The paper is [2] Performance Regions in Compressed Sensing from Noisy Measurements by Junan Zhu and Dror Baron and the abstract reads:
In this paper, compressed sensing with noisy measurements is addressed. The theoretically optimal reconstruction error is studied by evaluating Tanaka's equation. The main contribution is to show that in several regions, which have different measurement rates and noise levels, the reconstruction error behaves differently. This paper also evaluates the performance of the belief propagation (BP) signal reconstruction method in the regions discovered. When the measurement rate and the noise level lie in a certain region, BP is suboptimal with respect to Tanaka's equation, and it may be possible to develop reconstruction algorithms with lower error in that region.
So we now have some good reasons to believe that AMP/Belief Propagation solvers ought to be improved. One of the reason we have had a clear perspective for better algorithms in compressive sensing, has mainly been because of phase transition graphs [7] not RIP or similar admissibility conditions. Hence, Junan and Dror provide us a clear roadmap for improvement. Maybe one of the way to do so is by looking at different (long range) potentials and then if that happens, what sorts of seeded matrices [3] does that imply ? etc....


The second insight came from an arxiv preprint by Raj Rao Nadakuditi entitled: When are the most informative components for inference also the principal components?. The abstract reads:
Which components of the singular value decomposition of a signal-plus-noise data matrix are most informative for the inferential task of detecting or estimating an embedded low-rank signal matrix? Principal component analysis ascribes greater importance to the components that capture the greatest variation, i.e., the singular vectors associated with the largest singular values. This choice is often justified by invoking the Eckart-Young theorem even though that work addresses the problem of how to best represent a signal-plus-noise matrix using a low-rank approximation and not how to best_infer_ the underlying low-rank signal component.
Here we take a first-principles approach in which we start with a signal-plus-noise data matrix and show how the spectrum of the noise-only component governs whether the principal or the middle components of the singular value decomposition of the data matrix will be the informative components for inference. Simply put, if the noise spectrum is supported on a connected interval, in a sense we make precise, then the use of the principal components is justified. When the noise spectrum is supported on multiple intervals, then the middle components might be more informative than the principal components.
The end result is a proper justification of the use of principal components in the setting where the noise matrix is i.i.d. Gaussian and the identification of scenarios, generically involving heterogeneous noise models such as mixtures of Gaussians, where the middle components might be more informative than the principal components so that they may be exploited to extract additional processing gain. Our results show how the blind use of principal components can lead to suboptimal or even faulty inference because of phase transitions that separate a regime where the principal components are informative from a regime where they are uninformative.
Raj makes the point that by looking at the gap of the spectrum of data matrices, one can spot when the noise is supported on connected intervals.  



How is this relevant ?

Quite simply, it answers at least one question: when one should employ Robust PCA [6] as opposed to simple PCA ? When you have images or movies, the task is kind of simple [8], if you are not that lucky, you probably have to resort to some domain specific expertise. Raj seems to suggest that first computing the SVD of the data matrix and finding those gaps in the spectrum is an indication that you ought to be looking for a Robust PCA solution. In a new paper, Tianyi Zhou seems to suggest that robust PCA might only be 20 times as long as a single PCA, but sometimes knowing how to switch from one to the other automatically might save time. I also wonder how this result is changes when using a randomized version of the PCA in the context of very large problems.

Another way to look at it is from a totally different standpoint, it might be a way for detecting the changing behavior of a target  through a multiscattering medium, etc...  


- Update 1: I received the following from Lenka


Dear Igor,

At this late hour I made it to catch up with Nuit Blanche. I would like to point out that the result of the paper by Junan Zhu and Dror Baron were already included in our paper Probabilistic Reconstruction in Compressed Sensing: Algorithms, Phase Diagrams, and Threshold Achieving Matrices, Section V.C, and we indeed showed that also in that case the seeded matrices serve to go close to optimality, Section VI.C. The noisy case is included in the Donoho, Javanmard, Montanari proof of optimality with seeding/spatial coupling. I actually just wrote about this to Junan and Dror before reading today's Nuit Blanche.
Best!
end of update 1

Update 2: Paul Shearer provides some additional insight on Raj Rao's preprint on the Google + community forum -

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, February 22, 2013

Reddit, Google+, LinkedIn as Peer Review Systems

Science is only as robust as its peer review system. But as the number of papers is steadily increasing and the current pre-publication peer review system is crumbling under that weight while becoming a hindrance more than anything, we have to go beyond what is currently proposed. 

Daniel Lemire proposes today to publicly peer review your paper. As you know, this capability exists here on Nuit Blanche and associated extensions as well. Yesterday, I mentioned several ways for your work to be disseminated to a wider audience in A New Discovery Engine. Equally, you will probably have noticed that the following platforms also provide a way for people to review your work. 

At the most basic level, people can already "like" your paper and send a signal such as "Google +1" 
But sometimes, you want to go deeper than a superficial signal and write something that you'd normally only write as a review report when probed by some editor. You now have the means to do so on Nuit Blanche and its associated extensions. It's been there for a while but let's spell it out loud.

If you want to review anonymously a paper, you can:
  • Comment under "anonymous" in the comment section of the blog entry dedicated to that paper. If there are several papers, please mention the paper you are reviewing. 
  • Write a review in the comment section of the CompressiveSensing subreddit (85 people) under a throwaway account. If the paper is not listed there, you can add it to the subreddit and then provide the review in the comment section. It would be great if you'd add Peer Review in the title and a link to the paper you are reviewing (and the version you are reviewing) in the body of the review.
In both cases, as an "owner" of these groups, I will act with ruthlessness on reviews that are plain mean. In short, don't be an a***hole.

If you want to provide some feedback to the author of a paper and not be anonymous, you can comment on social circles that enforce some sorts of real name - identity scheme. Here are the ones we have so far:
In the LinkedIn groups, the RSS feature will be disabled by LinkedIn soon, so what you want to do is reference an arxiv paper or a Nuit Blanche entry. and then start a discussion where you can insert  your review in a commentary. The title of the conversation should include "Peer Review" followed by the name of the paper you are reviewing. 

For some people the act of reviewing may be daunting because you are insecure about to go about it. Here are some easy way to gain strength 
  • Find the typos and lists them in a review.
  • Try running the implementation attendant to the papers and tell the world if they produce what the paper is stating.
Doing just that will get you some credence and respect in the community. 


Related Posts:

See All by Looking at A Few: Sparse Modeling for Finding Representative Objects - implementation -

Ehsan Elhamifar presents the "Sparse Modeling Representative Selection (SMRS) is an algorithm based on sparse multiple-measurement-vector recovery theory for selecting a subset of data points as the representatives." The attendant paper is: See All by Looking at A Few: Sparse Modeling for Finding Representative Objects by Ehsan ElhamifarRene Vidal. The abstract reads:

We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. The key difference is that SMCE finds both the neighbors and the weights automatically. This is done by solving a sparse optimization problem, which encourages selecting nearby points that lie in the same manifold and approximately span a low-dimensional affine subspace. The optimal solution encodes information that can be used for clustering and dimensionality reduction using spectral clustering and embedding. Moreover, the size of the optimal neighborhood of a data point, which can be different for different points, provides an estimate of the dimension of the manifold to which the point belongs. Experiments demonstrate that our method can effectively handle multiple manifolds that are very close to each other, manifolds with non-uniform sampling and holes, as well as estimate the intrinsic dimensions of the manifolds.

The code is here.



Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, February 21, 2013

nimfa - A Python Library for Nonnegative Matrix Factorization Techniques - implementation -


nimfa - A Python Library for Nonnegative Matrix Factorization Techniques

From the page:

Nimfa is an open-source Python library that provides a unified interface to nonnegative matrix factorization algorithms. It includes implementations of state-of-the-art factorization methods, initialization approaches, and quality scoring. Both dense and sparse matrix representation are supported.


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.