Pages

Friday, October 30, 2009

CS: Recovering low-rank matrices from few coefficients in any basis, Probability of Unique Integer Solution to a System of Linear Equations

Two items first. In yesterday's entry on the Sudoku paper ( Linear Systems, Sparse Solutions, and Sudoku by Prabhu Babu, Kristiaan Pelckmans, Petre Stoica, and Jian Li) it is interesting to find out that a re-weighted scheme seems to be doing better than some game specific greedy algorithm. Furthermore, at the end of the paper, we can read the following:

The presently known sufficient conditions on A for checking the equivalence of P0 and P1, like the restricted isometry property (RIP) [5] and Kashin, Garnaev, Gluskin (KGG) inequality [6], do not hold for Sudoku. So analyzing the equivalence of l0 and l1 norm solutions in the context of Sudoku requires novel conditions for sparse recovery.
I am not sure that RIP of [5] is the condition to check for here, (RIP-2) more likely the sparsity of the measurement matrix implies some type of RIP-1 condition. I also thought KGG was a necessary and sufficient condition albeit an NP-Hard at that. Other conditions can be found on the Compressive Sensing Big Picture page.

In the super-resolution paper mentioned earlier this week, entitled Super-resolution ghost imaging via compressive sampling reconstruction by Wenlin Gong and Shensheng Han, Wenlin confirms to me that only l1_magic was used not GPSR as the reference would tend to indicate. I am sure we are going to see some improvement in the near future. Other reconstruction solvers can be found in the Compressive Sensing Big Picture page too.


David Gross just sent me the following:
...I'm a physicist and new to the field. Originally, some colleagues and me got interested in compressed sensing and matrix completion in the context of quantum state tomography (meaning the experimental characterization of a quantum system). Our paper arXiv:0909.3304 was mentioned on your blog.

The methods we needed to develop in order to translate known results on matrix completion by Candes, Recht, Tao and others to quantum mechanics proved far more general than anticipated. We can now show that a low-rank (n x n)-matrix may be recovered from basically O(n r log^2 n) randomly selected expansion coefficients with respect to any matrix basis. The matrix completion problem is just a special case.

Most importantly, though, the complexity of the proofs was reduced substantially. The spectacular argument by Candes and Tao in arXiv:0903.1476 covered about 40 pages. The new paper implies these results, but is much simpler to digest.

The first version of the paper went onto the arxiv two weeks ago: arXiv:0910.1879. However, it significantly evolved since then. In some cases the bounds are now down to O(n r log n), which -- I'm happy to say -- is tight up to multiplicative factors.

There is an obvious challenge to be met. The O(n r log n)-bounds do not currently extend to the original matrix completion problem. They come tantalizingly close, though, with only one single lemma obstructing a more general result. The precise problem is pointed out at the end of Section III B.

Before submitting the paper, I plan to add a final note. The statements all continue to be true if instead of a matrix basis a "tight frame" (a.k.a. "overcomplete basis") is used.

I should also point out Benjamin Recht's recent and strongly related pre-print arXiv:0910.0651 to you. He independently translated some of the results in our earlier paper on quantum tomography to the matrix completion problem (apparently overlooking our announcement of exactly the same result in the physics paper).

Let us note that the last preprint of Benjamin Recht was recently updated. But first, here is the new revision entitled: Recovering low-rank matrices from few coefficients in any basis by David Gross

We establish novel techniques for analyzing the problem of low-rank matrix recovery. The methods are both considerably simpler, and more general than previous approaches. It is shown that an unknown n × n matrix of rank r can be efficiently reconstructed given knowledge of only O(nr \nu log2 n) randomly sampled expansion coefficients with respect to any given matrix basis. The number \nu quantifies the “degree of incoherence” between the unknown matrix and the basis. Existing work concentrated mostly on the problem of “matrix completion”, where one aims to recover a low-rank matrix from randomly selected matrix elements. Our result covers this situation as a special case. The proof consists of a series of relatively elementary steps, which stands in contrast to the highly involved methods previously employed to obtain comparable results. In cases where bounds had been known before, our estimates are slightly tighter. We discuss operator bases which are incoherent to all low-rank matrices simultaneously. For these bases, we show that O(nr \nu log n) randomly sampled expansion coefficients suffice to recover any low-rank matrix with high probability. The latter bound is tight up to multiplicative factors. This seems to be the first time the optimal order of the log-factor has been proved to be achievable for a matrix reconstruction problem where neither the basis nor the unknown matrix is randomized. This work is an expanded presentation of the recent pre-print [D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, arxiv:0909.3304] which was aimed at researches working in quantum information theory.
Thanks David !

I also found the following: Probability of Unique Integer Solution to a System of Linear Equations by Olvi Mangasarian and Benjamin Recht. The abstract reads:
We consider a system of m linear equations in n variables Ax = d and give necessary and sufficient conditions for the existence of a unique solution to the system that is integer: x element of {−1,1}^n. We achieve this by reformulating the problem as a linear program and deriving necessary and sufficient conditions for the integer solution to be the unique primal optimal solution. We show that as long as m is larger than n/2, then the linear programming heuristic succeeds for most instances, but if m is less than n/2, the heuristic fails on most instances. We also demonstrate that these predictions match the empirical performance of the linear programming formulation to very high accuracy.


Enjoy!

Credit: NASA / GSFC / ASU, Via The Planetary Society Blog: Apollo 17 lander and flag! An image of the Apollo 17 lander, Challenger, captured by the Lunar Reconnaissance Orbiter Camera on October 1, 2009, includes not only the lander and astronaut tracks but also a fuzzy dark pixel at the location of the American flag erected there by astronauts Jack Schmitt and Gene Cernan.

Thursday, October 29, 2009

CS: Linear Systems, Sparse Solutions, and Sudoku, Compressed sensing performance bounds under Poisson noise


Angshul Majumdar just pointed out to me the following paper entitled: Linear Systems, Sparse Solutions, and Sudoku by Prabhu Babu, Kristiaan Pelckmans, Petre Stoica, and Jian Li. The abstract reads:
In this paper, we show that Sudoku puzzles can be formulated and solved as a sparse linear system of equations. We begin by showing that the Sudoku ruleset can be expressed as an underdetermined linear system: ${mmb{Ax}}={mmb b}$, where ${mmb A}$ is of size $mtimes n$ and $n>m$. We then prove that the Sudoku solution is the sparsest solution of ${mmb{Ax}}={mmb b}$, which can be obtained by $l_{0}$ norm minimization, i.e. $minlimits_{mmb x}Vert{mmb x}Vert_{0}$ s.t. ${mmb{Ax}}={mmb b}$. Instead of this minimization problem, inspired by the sparse representation literature, we solve the much simpler linear programming problem of minimizing the $l_{1}$ norm of ${mmb x}$, i.e. $minlimits_{mmb x}Vert{mmb x}Vert_{1}$ s.t. ${mmb{Ax}}={mmb b}$, and show numerically that this approach solves representative Sudoku puzzles.
We have heard about Sudoku before in compressive sensing. The code implementing this can be found here.




This paper describes performance bounds for compressed sensing (CS) where the underlying sparse or compressible (sparsely approximable) signal is a vector of nonnegative intensities whose measurements are corrupted by Poisson noise. In this setting, standard CS techniques cannot be applied directly for several reasons. First, the usual signal-independent and/or bounded noise models do not apply to Poisson noise, which is non-additive and signal-dependent. Second, the CS matrices typically considered are not feasible in real optical systems because they do not adhere to important constraints, such as nonnegativity and photon flux preservation. Third, the typical $\ell_2$--$\ell_1$ minimization leads to overfitting in the high-intensity regions and oversmoothing in the low-intensity areas. In this paper, we describe how a feasible positivity- and flux-preserving sensing matrix can be constructed, and then analyze the performance of a CS reconstruction approach for Poisson data that minimizes an objective function consisting of a negative Poisson log likelihood term and a penalty term which measures signal sparsity. We show that, as the overall intensity of the underlying signal increases, an upper bound on the reconstruction error decays at an appropriate rate (depending on the compressibility of the signal), but that for a fixed signal intensity, the signal-dependent part of the error bound actually grows with the number of measurements or sensors. This surprising fact is both proved theoretically and justified based on physical intuition.

On his webpage, Roummel Marcia has the following announcement:
Opportunities: Positions for graduate and undergraduate students are available in the areas of optimization, linear algebra, and compressed sensing. These positions are in conjunction with the National Science Foundation grant, DMS-0811062: Second-order methods for large-scale optimization in compressed sensing. Send me an email if you are interested in any of these positions.

Wednesday, October 28, 2009

CS: Super-resolution ghost imaging via compressive sampling reconstruction


Bada bing, bada boom, when was the last time you had to increase the size of your pixel on a CCD dye in order to obtain a better resolution ? Looks like this is a result coming out of a CS reconstruction in another Ghost Imaging experiment. Enjoy !

Super-resolution ghost imaging via compressive sampling reconstruction by Wenlin Gong and Shensheng Han. The abstract reads:
For ghost imaging, pursuing high resolution images and short acquisition times required for reconstructing images are always two main goals. We report an image reconstruction algorithm called compressive sampling (CS) reconstruction to recover ghost images. By CS reconstruction, ghost imaging with both super-resolution and a good signal-to-noise ratio can be obtained via short acquisition times. Both effect influencing and approaches further improving the resolution of ghost images via CS reconstruction, relationship between ghost imaging and CS theory are also discussed.

Tuesday, October 27, 2009

CS: A job with Exxon.

As much as it looks like a nice opportunity, this one is not located in Paris so I will pass up on it. Here is a job opportunity that showed up on my radar screen which has been added to the Compressive Sensing Jobs page:

Employer: ExxonMobil
Location: Clinton, NJ United States
Last Updated: 10/26/2009
Job Code: 7382BR

AutoReqId 7382BR
Job or Campus Folder Research Scientists (2) -Wave Propagation and Numerical Methods


Job Description ExxonMobil's Corporate Strategic Research Laboratory is seeking applications from talented individuals in physics, applied mathematics, or engineering with a strong record of achievements in fields related to wave propagation in heterogeneous media, and its associated mathematical and numerical methods.


Research Scientist - Wave propagation and transport in heterogeneous and porous media.

Experience with the physical, mathematical and numerical analyses of one of: wave propagation in heterogeneous media, multi-scale transport phenomena, and/or fluid-structure interactions at the rock pore scale and their impact on the macro scale.

Research Scientist-Compressive sensing.

Experience with the mathematics of signal/image processing, denoising, sub-Nyquist signal reconstruction, and sparse representation of data.

Applicants should have a PhD in applied mathematics, physics, engineering, geophysics, or a related field, with a strong ability in their field of expertise. Proficiency with scientific programming languages and experience with large-scale, parallel, numerical simulations are definite advantages. The ability to communicate and interact with internal and external groups will be an important selection criterion. Candidates should have a strong publication record, excellent oral presentation and writing skills, and show a desire and ability to grow into new science areas.

The successful candidate will join a dynamic, multi-disciplinary group of world-class scientists who focus on performing breakthrough research and creating new approaches to solve our most challenging problems. Technical staff members in this position implement and report on independent research, participate in program development, as well as collaborate internationally with leading engineers and scientists from industry, universities and other technical institutions.

ExxonMobil's Corporate Strategic Research (CSR) laboratory is a powerhouse in energy research focusing on fundamental science that can lead to technologies having a direct impact on solving our biggest energy challenges. Our facilities are centrally located in scienic Annandale, New Jersey, approximately one hour from both New York City and Phildelphia.
ExxonMobil is an Equal Opportunity Employer
Job Location Clinton, NJ

Monday, October 26, 2009

CS: Making CS Mainstream, CS Data Recovery in Wireless Sensor Networks, Routing and Signal for CS in WSNs, Very High Resolution SAR, Postdoc


One of the ways to make compressive sensing more mainstream is to enable tinkerers to play with something that implements some sorts of controlled multiplexing. Case in point, what could be done with the newly released tiny DMD based projectors or these ? and what about this DMD kit with no driver ? then there is always the more expensive DLP kits from TI ? By the way, if you are wondering how much an SLM cost, such the one used in the work of by Ivo Vellekoop and Allard Mosk ( Phase control algorithms for focusing light through turbid media ) that is about 3,000 US$ and up.

Today, we have papers and jobs mostly from Europe, enjoy!

A Bayesian Analysis of Compressive Sensing Data Recovery in Wireless Sensor Networks by Riccardo Masiero, Giorgio Quer, Michele Rossi and Michele Zorzi. The abstract reads:
In this paper we address the task of accurately reconstructing a distributed signal through the collection of a small number of samples at a data gathering point using Compressive Sensing (CS) in conjunction with Principal Component Analysis (PCA). Our scheme compresses in a distributed way real world non-stationary signals, recovering them at the data collection point through the online estimation of their spatial/temporal correlation structures. The proposed technique is hereby characterized under the framework of Bayesian estimation, showing under which assumptions it is equivalent to optimal maximum a posteriori (MAP) recovery. As the main contribution of this paper, we proceed with the analysis of data collected by our indoor wireless sensor network (WSN) testbed, proving that these assumptions hold with good accuracy in the considered real world scenarios. This provides empirical evidence of the effectiveness of our approach and proves that CS is a legitimate tool for the recovery of real-world signals in WSNs.

On the Interplay Between Routing and Signal Representation for Compressive Sensing in Wireless Sensor Networks by Giorgio Quer, Riccardo Masiero, Daniele Munaretto, Michele Rossi, Joerg Widmer and Michele Zorzi. The abstract reads:
Compressive Sensing (CS) shows high promise for fully distributed compression in wireless sensor networks (WSNs). In theory, CS allows the approximation of the readings from a sensor field with excellent accuracy, while collecting only a small fraction of them at a data gathering point. However, the conditions under which CS performs well are not necessarily met in practice. CS requires a suitable transformation that makes the signal sparse in its domain. Also, the transformation of the data given by the routing protocol and network topology and the sparse representation of the signal have to be incoherent, which is not straightforward to achieve in real networks. In this work we address the data gathering problem in WSNs, where routing is used in conjunction with CS to transport random projections of the data.We analyze synthetic and real data sets and compare the results against those of random sampling. In doing so, we consider a number of popular transformations and we find that, with real data sets, none of them are able to sparsify the data while being at the same time incoherent with respect to the routing matrix. The obtained performance is thus not as good as expected and finding a suitable transformation with good sparsification and incoherence properties remains an open problem for data gathering in static WSNs.

At Fringe 2009, a meeting organized by ESA in Italy the following paper will be presented (only the abstract is available): Very High Resolution SAR Tomography via Compressive Sensing by Zhu Xiaoxiang and Bamler Richard. The abstract reads:
SAR tomography (TomoSAR) extends the synthetic aperture principle into the elevation direction for 3-D imaging. It uses stacks of several acquisitions from slightly different viewing angles (the elevation aperture) to reconstruct the reflectivity function along the elevation direction by means of spectral analysis for every azimuth-range pixel.

The new class of meter-resolution spaceborne SAR systems (TerraSAR-X and COSMO-Skymed) offers a tremendous improvement in tomographic reconstruction of urban areas and man-made infrastructure. The high resolution fits well to the inherent scale of buildings (floor height, distance of windows, etc.). In order to fully exploit the potential of this class of meter-resolution data there is demand for new and improved TomoSAR inversion algorithms.

Compressive sensing (CS) is a new and attractive technique for TomoSAR. It aims at minimizing the number of measurements to be taken from signals while still retaining the information necessary to approximate them well. It provides a good compromise between classical parametric and non-parametric spectral analysis methods for TomoSAR. Compared to parametric spectral analysis, CS is more robust to phase noise, has lower computational effort, and does not require model selection to provide the prior knowledge about the number of scatterers in a resolution cell. Compared to non-parametric spectral estimation CS overcomes the limitation of elevation resolution caused by the length of elevation aperture, i.e. CS has super-resolution properties.

In this paper the CS approach to TomoSAR is outlined. Its extension to differential (4-D) TomoSAR is introduced. Numerical simulations for realistic acquisition and noise scenarios will be presented to evaluate the potential and limits of the new technique. The first CS TomoSAR results with TerraSAR-X spotlight data (1 m resolution) over urban areas will be presented.

Finally, here a postdoc job offer that I will be put shortly on the Compressive Sensing Jobs page:
Postdoctoral Position in Wireless Communications at University of Vigo, Spain,
Description
The Signal Processing in Communications Group (GPSC, www.gts.tsc.uvigo.es/gpsc ), affiliated with the Department of Signal Theory and Communications at University of Vigo, Spain, invites applications for one postdoctoral positions in the field of wireless communications. The selected candidates will join GPSC to investigate fundamentals and algorithm design/evaluation for communication and sensor networks. Areas of particular interest include:
  • Sensor networks
  • Cognitive Radio
  • Compressed Sensing
GPSC is formed by 6 faculty members from the University of Vigo as well as several MSc and PhD students, and participates in several research projects funded by the European Commission and the Spanish Government. Among these, the recently launched COMONSENS project (www.comonsens.org) integrates investigators from 10 different top research institutions in Spain. GPSC members also actively collaborate with the Galician R&D Center in Advanced Telecommunications (GRADIANT, www.gradiant.org ) in diverse contracts with ICT companies. Thus, the selected candidates will enjoy unique opportunities to participate in exciting research projects with both industry and academia.

Desirable background
  • A Ph.D. degree in Electrical Engineering is required; PhD obtained before January 1, 2007
  • Knowledge and experience in sensor networks, cognitive radio and/or compressed sensing
  • Good verbal and written skills in English are required
  • Strong publications in international conferences and journals in the area of communications
  • Postdoctoral experience in a recognized group with expertise in the field is a plus
  • Experience in the organization, management and training of technical staff/students is a plus
  • Communication, computing and interpersonal skills are important
  • Capacity to work both independently and within a team
Contract conditions
The initial appointment will be for one year, with annual renewal dependent on funding and performance. Expected start date is January 2010. Successful applicants will be offered a 36,000 € yearly gross salary depending on qualifications, as well as health benefits.
Applications
Interested candidates may apply to Prof. Nuria González Prelcic (nuria@gts.tsc.uvigo.es).
Applications should include electronic copies of the following:
  • A detailed Curriculum Vitae. (*Please include your e-mail address and a recent picture.)
  • A cover letter addressing the specified job qualifications.
  • A letter of recommendation by a senior Professor/Researcher.
  • A copy of the publication deemed as best representative of the candidate’s creative research.
Priority consideration will be given to applications received by November 1, 2009. Applications will be accepted until position is filled.
Contact: Nuria González Prelcic
Departamento de Teoría de la Señal y Comunicaciones
ETSET. University of Vigo
36310 Vigo. Spain
Phone: +34 986 818656, e-mail: nuria@gts.tsc.uvigo.es


Credit: Palomar Observatory, This image of the moon was taken by the Palomar Observatory at the time of the LCROSS impact.

Saturday, October 24, 2009

CS: Compressed Sensing for Fusion Frames, Combinatorial Compressed Sensing, The Geometry of Generalized Binary Search, Ranging Imager,TCS and finance.

Here are a few papers of related interest for the week-end.

Compressed Sensing for Fusion Frames by Petros Boufounos, Gitta Kutyniok, and Holger Rauhut. The abstract reads:
Compressed Sensing (CS) is a new signal acquisition technique that allows sampling of sparse signals using significantly fewer measurements than previously thought possible. On the other hand, a fusion frame is a new signal representation method that uses collections of subspaces instead of vectors to represent signals. This work combines these exciting new fields to introduce a new sparsity model for fusion frames. Signals that are sparse under the new model can be compressively sampled and uniquely reconstructed in ways similar to sparse signals using standard CS. The combination provides a promising new set of mathematical tools and signal models useful in a variety of applications. With the new model, a sparse signal has energy in very few of the subspaces of the fusion frame, although it needs not be sparse within each of the subspaces it occupies. We define a mixed `1/`2 norm for fusion frames. A signal sparse in the subspaces of the fusion frame can thus be sampled using very few random projections and exactly reconstructed using a convex optimization that minimizes this mixed `1/`2 norm. The sampling conditions we derive are very similar to the coherence and RIP conditions used in standard CS theory.


Then there is a presentation by Mark Iwen on Combinatorial Compressed Sensing: Fast algorithms with Recovery Guarantees. Of noted interest is the application section at the very end where one considers learning a function with multiple parameters:


A situation that is also the subject of much interest in calibration or machine learning in general. Let us note that another way of performing some type of function learning is through the Experimental Probabilistic Hypersurface (more details can be found here). Not really related to compressive sensing but of interest nonetheless since it deals with learning functions:

The Geometry of Generalized Binary Search by Robert Nowak. The abstract reads:
This paper investigates the problem of determining a binary-valued function through a sequence of strategically selected queries. The focus is an algorithm called Generalized Binary Search (GBS). GBS is a well-known greedy algorithm for determining a binary-valued function through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideration into two disjoint subsets, a natural generalization of the idea underlying classic binary search and Shannon-Fano coding. GBS is used in many applications including channel coding, experimental design, fault testing, machine diagnostics, disease diagnosis, job scheduling, image processing, computer vision, and machine learning. This paper develops novel incoherence and geometric conditions under which GBS achieves the information-theoretically optimal query complexity; i.e., given a collection of N hypotheses, GBS terminates with the correct function in O(logN) queries. Furthermore, a noise tolerant version of GBS is developed that also achieves the optimal query complexity. These results are applied to learning multidimensional threshold functions, a problem arising routinely in image processing and machine learning.


And here are two architectures for cameras to determine ranges:
They are interesting in that they don't have your usual Point Spread Function (or transfer function).

Finally, I have always been impressed by some results in TCS (Theoretical Computer Science) and always wonders what type of impact they will have in the civil society, here are two in particular:


At some point, I am sure we'll also see some application of Compressive Sensing in the area of low latent trading or high or ultra-high frequency trading as it is called these days.

Friday, October 23, 2009

CS: Learning with Compressible Priors, Spatially-Localized Compressed Sensing and Routing in Multi-Hop Sensor Networks?, Learning low dim manifolds

If you recall, one of the reason Compressive Sensing is bound to touch on many subjects and fields of engineering ( see sparsity in everything series of posts) lies in the fact that most occurrences in Nature follow some type of power law. Volkan Cevher provides some insight on the subject of signal sparsity and the probability distribution from which they are sampled from in his upcoming paper at NIPS entitled: Learning with Compressible Priors. The abstract reads:

We describe a set of probability distributions, dubbed compressible priors, whose independent and identically distributed (iid) realizations result in p-compressible signals. A signal x element of R^N is called p-compressible with magnitude R if its sorted coefficients exhibit a power-law decay as |x|_(i) . R i^(-d), where the decay rate d is equal to 1/p. p-compressible signals live close to K-sparse signals (K \lt\lt N) in the l_r-norm (r \gt p) since their best K-sparse approximation error decreases with O (R K^(1/r-1/p) ) We show that the membership of generalized Pareto, Student’s t, log-normal, Frechet, and log-logistic distributions to the set of compressible priors depends only on the distribution parameters and is independent of N. In contrast, we demonstrate that the membership of the generalized Gaussian distribution (GGD) depends both on the signal dimension and the GGD parameters: the expected decay rate of N-sample iid realizations from the GGD with the shape parameter q is given by 1/[q log (N/q)]. As stylized examples, we show via experiments that the wavelet coefficients of natural images are 1.67-compressible whereas their pixel gradients are 0:95 log (N/0.95)-compressible, on the average. We also leverage the connections between compressible priors and sparse signals to develop new iterative re-weighted sparse signal recovery algorithms that outperform the standard l_1-norm minimization. Finally, we describe how to learn the hyperparameters of compressible priors in underdetermined regression problems.

Volkan Cevher also makes available RANDSC, a small code generating compressible signals from a specified distribution. If we could now make a connection between these distributions and the l_q ( q less than 1) minimization techniques used to recover signals, it would be great[Oops, let me take that back, Volkan points to section 5.2 entitled " Iterative l_s-decoding for iid scale mixtures of GGD", duh]

Also found via another blog: Spatially-Localized Compressed Sensing and Routing in Multi-Hop Sensor Networks? by Sungwon Lee, Sundeep Pattem, Maheswaran Sathiamoorthy, Bhaskar Krishnamachari, and Antonio Ortega. The abstract reads:

We propose energy-efficient compressed sensing for wireless sensor networks using spatially-localized sparse projections. To keep the transmission cost for each measurement low, we obtain measurements from clusters of adjacent sensors. With localized projection, we show that joint reconstruction provides signi cantly better reconstruction than independent reconstruction. We also propose a metric of energy overlap between clusters and basis functions that allows us to characterize the gains of joint reconstruction for di erent basis functions. Compared with state of the art compressed sensing techniques for sensor network, our experimental results demonstrate signi cant gains in reconstruction accuracy and transmission cost.
Finally, I am rebounding on yesterday's statement on Machine Learning and Compressive Sensing here is a view of some of the subject in ML and Manifolds featured in the recent talk by Yoav Freund entitled Learning low dimensional manifolds and presented at Google (by the way, what's up with Google engineers who don't know their mics are on ? uh ?)




What is interesting is his use of manifold for system calibration at 52 minutes into the video where he describes the 3 dimensional manifold living in a 23-dimension dataset. Yoav's project described in the video is the Automatic Cameraman.

Thursday, October 22, 2009

CS: NIPS Workshop on Manifolds, sparsity, and structured models, Active Learning, Random Coding, FRI,


Ever since that blog entry featuring work on manifold signal processing and CS , I have had some expectation of some type of integration of compressive sensing in machine learning topics beyond simply publications. It looks like 2009 is the year when this is happening in the form of workshops within an ML conference. First, Mike Wakin is co-organizing a NIPS workshop on Low-dimensional geometric models along with Richard Baraniuk, Piotr Indyk, Bruno Olshausen, Volkan Cevher, and Mark Davenport,. The call for contributions for that workshop follows:


CALL FOR CONTRIBUTIONS
NIPS Workshop on Manifolds, sparsity, and structured models: When can low-dimensional geometry really help?

Whistler, BC, Canada, December 12, 2009

http://dsp.rice.edu/nips-2009


Important Dates:
----------------
  • Submission of extended abstracts: October 30, 2009 (later submission might not be considered for review)
  • Notification of acceptance: November 5, 2009
  • Workshop date: December 12, 2009

Overview:
---------
Manifolds, sparsity, and other low-dimensional geometric models have long been studied and exploited in machine learning, signal processing and computer science. For instance, manifold models lie at the heart of a variety of nonlinear dimensionality reduction techniques. Similarly, sparsity has made an impact in the problems of compression, linear regression, subset selection, graphical model learning, and compressive sensing. Moreover, often motivated by evidence that various neural systems are performing sparse coding, sparse representations have been exploited as an efficient and robust method for encoding a variety of natural signals. In all of these cases the key idea is to exploit low-dimensional models to obtain compact representations of the data.

The goal of this workshop is to find commonalities and forge connections between these different fields and to examine how we can we exploit low-dimensional geometric models to help solve common problems in machine learning and beyond.

Submission instructions:
------------------------

We invite the submission of extended abstracts to be considered for a poster presentation at this workshop. Extended abstracts should be 1-2 pages, and the submission does not need to be blind. Extended abstracts should be sent to md@rice.edu in PDF or PS file format.

Accepted extended abstracts will be made available online at the workshop website.

Organizers:
-----------
* Richard Baraniuk, Volkan Cevher, Mark Davenport, Rice University.
* Piotr Indyk, MIT.
* Bruno Olshausen, UC Berkeley.
* Michael Wakin, Colorado School of Mines.

Second, Rui Castro is one of the organizer of a NIPS workshop on Adaptive Sensing, Active Learning and Experimental Design. From the NIPS workshop webpage:
  • Submission of extended abstracts: October 27, 2009
    (later submission might not be considered for review)
  • Notification of acceptance: November 5, 2009
  • Workshop date: December 11, 2009

Also found on the interwebs:

Channel protection: Random coding meets sparse channels, by M. Salman Asif, William Mantzel and Justin Romberg. The abstract reads:
Multipath interference is an ubiquitous phenomenon in modern communication systems. The conventional way to compensate for this effect is to equalize the channel by estimating its impulse response by transmitting a set of training symbols. The primary drawback to this type of approach is that it can be unreliable if the channel is changing rapidly. In this paper, we show that randomly encoding the signal can protect it against channel uncertainty when the channel is sparse. Before transmission, the signal is mapped into a slightly longer codeword using a random matrix. From the received signal, we are able to simultaneously estimate the channel and recover the transmitted signal. We discuss two schemes for the recovery. Both of them exploit the sparsity of the underlying channel. We show that if the channel impulse response is sufficiently sparse, the transmitted signal can be recovered reliably.

and Sparse Sampling of Structured Information and its Application to Compression, by Pier Luigi Dragotti. The abstract reads:
It has been shown recently that it is possible to sample classes of non-bandlimited signals which we call signals with Finite Rate of Innovation (FRI). Perfect reconstruction is possible based on a set of suitable measurements and this provides a sharp result on the sampling and reconstruction of sparse continuous-time signals. In this paper, we first review the basic theory and results on sampling signals with finite rate of innovation. We then discuss variations of the above framework to handle noise and model mismatch. Finally, we present some results on compression of piecewise smooth signals based on the FRI framework.





Image Credit: NASA/JPL/Space Science Institute, W00060562.jpg was taken on October 19, 2009 and received on Earth October 20, 2009. The camera was pointing toward SATURN at approximately 2,174,289 kilometers away, and the image was taken using the CB2 and CL2 filters.

Wednesday, October 21, 2009

CS: Workshop on Sparsity and Modern Mathematical Methods for High Dimensional Data, Learning Deep Architectures for AI

One of the use of focusing light in turbid tissues could easily be used in system like this one. In a different area, I have always wondered why when one delivers a certain radiation dose to a particular area of the body, additional sensors were not arranged around that would perform an imaging task at the same time. Food for thoughts. Anyway, there will be an Interdisciplinary Workshop on Sparsity and Modern Mathematical Methods for High Dimensional Data on April 6--10, 2010 in Brussels, Belgium.

While reading Learning Deep Architectures for AI by Yoshua Bengio. The abstract reads;

Theoretical results suggest that in order to learn the kind of complicated functions that can represent high level abstractions (e.g. in vision, language, and other AI-level tasks), one may need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers or in complicated propositional formulae re-using many sub-formulae. Searching the parameter space of deep architectures is a difficult task, but learning algorithms such as those for Deep Belief Networks have recently been proposed to tackle this problem with notable success, beating the state-of-the-art in certain areas. This paper discusses the motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer models such as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks.
I noted a nice discussion about compressive sensing in paragraph 7.1 entitled "Sparse Representations in Auto-Encoders and RBMs"


Credit: NASA / JPL / SSI / colorization by Gordan Ugarkovic, Tethys and Titan, Cassini captured a grayscale animation of Tethys crossing in front of Titan on October 17, 2009. In this version, Gordan Ugarkovic has colored in Titan based on its color as seen in previous Cassini photos.

Tuesday, October 20, 2009

CS: Lower Bounds for Sparse Recovery, Super-resolution, Sparse Multipath Channels, RIP for Structurally-Subsampled Unitary Matrices

The photo on the side shows the resulting plume as seen from the LCROSS cameras.

In a different direction, here is a thing I did not know about SSDs. As opposed to Hard Disk Drives, SSDs can allow parallel access to different jobs running on the CPU. SSDs also allow faster access to the data in memory (that part I knew). I wonder how this could help some of the reconstruction processes in Compressive Sensing ?

Jean-Luc Stark will give a talk at Saclay today, let us hope that the audience will get to hear if the initial tests of Compressive Sensing encoding of the PACS camera on Herschel are good.

Now let's go back to papers. I mentioned the abstract earlier, but now the paper is now available: Lower Bounds for Sparse Recovery by Khanh Do Ba, Piotr Indyk, Eric Price and David Woodruff. The abstract reads:
We consider the following k-sparse recovery problem: design an m * n matrix A, such that for any signal x, given Ax we can efficiently recover x^ satisfying ||x - x^||_1 \le C min_k-sparse x0 ||x - x'||_1. It is known that there exist matrices A with this property that have only O(k log(n/k)) rows. In this paper we show that this bound is tight. Our bound holds even for the more general randomized version of the problem, where A is a random variable, and the recovery algorithm is required to work for any fixed x with constant probability (over A).
I mentioned this approach before as it was featured in a video online, here is the associated paper: Super-Resolution with Sparse Mixing Estimators by Stephane Mallat and Guoshen Yu. The abstract reads:
We introduce a class of inverse problem estimators computed by mixing adaptively a family of linear estimators corresponding to different priors. Sparse mixing weights are calculated over blocks of coefficients in a frame providing a sparse signal representation. They minimize an l1 norm taking into account the signal regularity in each block. Adaptive directional image interpolations are computed over a wavelet frame with an O(N logN) algorithm.
According to the paper, the SME algorithm is at http://www.cmap.polytechnique.fr/~mallat/SME.html, but it is not there for the moment.

Also found on the interwebs:

In this paper, we revisit the problem of signal detection in multipath environments. Existing results implicitly assume a rich multipath environment. Our work is motivated by physical arguments and recent experimental results that suggest physical channels encountered in practice exhibit a sparse structure, especially at high signal space dimension (i.e., large time-bandwidth product). We first present a model for sparse channels that quantifies the independent degrees of freedom (DoF) in the channel as a function of the physical propagation environment and signal space dimension. The number of DoF represents the delay-Doppler diversity afforded by the channel and, thus, critically impacts detection performance. Our focus is on two types of non-coherent detectors: the energy detector (ED) and the optimal non-coherent detector (ONCD) that assumes full knowledge of channel statistics. Results show, for a uniform distribution of paths in delay and Doppler, the channel exhibits a rich structure at low signal space dimension and then gets progressively sparser as this dimension is increased. Consequently, the performance of the detectors is identical in the rich regime. As the signal space dimension is increased and the channel becomes sparser, the ED suffers significant degradation in performance relative to the ONCD. Finally, our results show the existence of an optimal signal space dimension - one that yields the best detection performance - as a function of the physical channel characteristics and the operating signal to noise ratio (SNR).
After the result on Toeplitz matrices, which eventually could be used for coded aperture here is a new paper for measurement matrices having other properties: A Restricted Isometry Property for Structurally-Subsampled Unitary Matrices by Waheed Bajwa, Akbar Sayeed and Robert Nowak, The abstract reads:
Subsampled (or partial) Fourier matrices were originally introduced in the compressive sensing literature by Candes et al. Later, in papers by Candes and Tao and Rudelson and Vershynin, it was shown that (random) subsampling of the rows of many other classes of unitary matrices also yield effective sensing matrices. The key requirement is that the rows of U, the unitary matrix, must be highly incoherent with the basis in which the signal is sparse. In this paper, we consider acquisition systems that—despite sensing sparse signals in an incoherent domain—cannot randomly subsample rows from U. We consider a general class of systems in which the sensing matrix corresponds to subsampling of the rows of matrices of the form  = RU (instead of U), where R is typically a low-rank matrix whose structure reflects the physical/technological constraints of the acquisition system. We use the term “structurally-subsampled unitary matrices” to describe such sensing matrices. We investigate the restricted isometry property of a particular class of structurally-subsampled unitary matrices that arise naturally in application areas such as multiple-antenna channel estimation and sub-nyquist sampling. In addition, we discuss an immediate application of this work in the area of wireless channel estimation, where the main results of this paper can be applied to the estimation of multiple-antenna orthogonal frequency division multiplexing channels that have sparse impulse responses.

Finally, here is a recent presentation by Toh Kim Chuan on An accelerated proximal gradient method for nuclear norm minimization.

LCROSS Zoomed in image of the impact plume. The extent of the plume at 15 sec is approximately 6-8 km in diameter. Credit: NASA Click for full resolution.

Monday, October 19, 2009

CS: KSVDS-Box and OMPS-Box, Simultaneous Sparse Approximation, Inferring Ranking using Constrained Sensing, Justice Pursuit

Another blogger who went to the OSA meeting on top of David Brady blogged about the meeting and his encounter with compressive sensing and ghost imaging :-). On top of the previous answers, Angshul Majumdar seems to think that a "very neat answer to his question" is the Sparse Signal Restoration course by Ivan Selesnick at: http://cnx.org/content/m32168/latest/

Rebounding on Lianlin Li's blog entry (English translation here) of this week-end that features the second reference of this entry namely Phase control algorithms for focusing light through turbid media by Ivo Vellekoop and Allard Mosk and discusses Sparse Bayesian Algorithm, I allso want to show two figures from that paper:

There is a discussion of three algorithms for focusing light through some random medium by using different set of configurations for the Spatial Light Modulator (SLM). The first set is equivalent to the usual raster mode, while the third one has a random permutation component to it which is not unlike what we have in Compressive Sensing. The convergence of these algorithms can be shown in the following figures:

One final note on this work is that it is more or less equivalent to how one used to solve coded aperture problems and this is why I think CS reconstruction algorithms might provide faster way of focusing light.

Ron Rubinstein has released the KSVDS-Box and OMPS-Box packages..." These packages implement the OMP and K-SVD algorithms for sparse dictionaries, as introduced in their paper Double Sparsity - Learning Sparse Dictionaries for Sparse Signal Approximation (see below). He also published the packages KSVD-Boxv13 and OMP-Box v10. These new versions reduce memory consumption, accelerate computation, and resolve a few minor bugs..."

  • OMP-Box v10 Implementation of the Batch-OMP and OMP-Cholesky algorithms for quick sparse-coding of large sets of signals.
  • OMPS-Box v1 Implementation of the Batch-OMP and OMP-Cholesky algorithms for sparse dictionaries.
  • KSVD-Box v13 Implementation of the K-SVD and Approximate K-SVD dictionary training algorithms, and the K-SVD Denoising algorithm. Requires OMP-Box v10.
  • KSVDS-Box v11 Implementation of the sparse K-SVD dictionary training algorithm and the sparse K-SVD Denoising algorithm. Requires OMPS-Box v1. The package is also available without demo volumes (less recommended) at KSVDS-Box v11-min.
The paper is : Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation by Ron Rubinstein, Michael Zibulevsky and Michael Elad. The abstract reads:
An efficient and flexible dictionary structure is proposed for sparse and redundant signal representation. The proposed sparse dictionary is based on a sparsity model of the dictionary atoms over a base dictionary, and takes the form D = \Phi A where \Phi is a fixed base dictionary and A is sparse. The sparse dictionary provides efficient forward and adjoint operators, has a compact representation, and can be effectively trained from given example data. In this, the sparse structure bridges the gap between implicit dictionaries, which have efficient implementations yet lack adaptability, and explicit dictionaries, which are fully adaptable but non-efficient and costly to deploy. In this paper we discuss the advantages of sparse dictionaries, and present an efficient algorithm for training them. We demonstrate the advantages of the proposed structure for 3-D image denoising.
Here are a set of papers I found on the interwebs: Simultaneous Sparse Approximation : insights and algorithms by Alain Rakotomamonjy. The abstract reads:
This paper addresses the problem of simultaneous sparse approximation of signals, given an overcomplete dictionary of elementary functions, with a joint sparsity profile induced by a ℓp − ℓq mixednorm. Our contributions are essentially two-fold i) making connections between such an approach and other methods available in the literature and ii) on providing algorithms for solving the problem with different values of p and q. At first, we introduce a simple algorithm for solving the multiple signals extension of the Basis Pursuit Denoising problem (p = 1 and q = 2). Then, we show that for general sparsity-inducing ℓp − ℓq mixed-norm penalty, this optimization problem is actually equivalent to an automatic relevance determination problem. From this insight, we derive an simple EM-like algorithm for problems with ℓ1 − ℓq2 penalty. For addressing approximation problem with non-convex penalty (p \lt 1), we propose an iterative reweighted Multiple-Basis Pursuit ; we trade the non-convexity of the problem against several resolutions of the convex multiple-basis pursuit problem. Relations between such a reweighted algorithm and the Multiple-Sparse Bayesian Learning are also highlighted. Experimental results show how our algorithms behave and how they compare to related approaches (such as CosAmp) for solving simultaneous sparse approximation problem.
and Inferring Ranking using Constrained Sensing by Srikanth Jagabathula and Devavrat Shah. The abstract reads:
We consider the problem of recovering a function over the space of permutations (or, the symmetric group) over n elements from a given partial set of its Fourier series coefficients. This problem naturally arises in several applications such as ranked elections, multi-object tracking, ranking systems and recommendation systems. This problem has been widely studied in the context of discrete-time functions in the recently popular compressive sensing literature; however, the existing approaches don’t naturally extend to our problem setup. Inspired by the work of Donoho and Stark [?] in the context of discrete-time functions, we focus on non-negative functions with a sparse support (support size \lt\lt domain size). Our recovery method is based on finding the sparsest solution (through ℓ0 optimization) that is consistent with the available information. As the main result, we derive sufficient conditions for the functions that can be recovered exactly from a partial set of Fourier coefficients through ℓ0 optimization. Under a natural random model for the generation of functions, we quantify the recoverability conditions by deriving bounds on the sparsity (support size) for which the function satisfies the sufficient conditions with a high probability as n → ∞. Since ℓ0 optimization is computationally hard, its convex relaxation, ℓ1 optimization, is typically considered; however, we show that ℓ1 optimization fails to recover a function generated using the random model with a high probability as n → ∞. In order to overcome this problem, we propose a novel iterative algorithm for the recovery of functions that satisfy the sufficient conditions. Finally, using an Information Theoretic framework, we study the necessity conditions for exact recovery to be possible.
The related shorter NIPS08 paper is here. From the paper:
...Assuming that the function is sparse, our approach to performing exact recovery is to find the function with the sparsest support that is consistent with the given partial information, henceforth referred to as ℓ0 optimization. This approach is often justified by the philosophy of Occam’s razor. However, as illustrated in 2.3.1, the sparsest solution is not always the correct solution. Thus, we derive sufficient conditions in terms of sparsity (support size) for functions that can be recovered through ℓ0 optimization. Furthermore, finding a function with the sparsest support through ℓ0 minimization is in general computationally hard. This problem is typically overcome by considering the convex relaxation of the ℓ0 optimization problem. However, as we show in Theorem 3.1, such a convex relaxation does not yield exact recovery in our case. Thus, we propose a simple iterative algorithm and prove that the algorithm performs exact recovery of functions that satisfy the sufficient conditions.
Our work can be considered as an extension of the compressive sensing literature to non-commutative groups. To the best of our knowledge, our work is the first to consider exact recovery of functions over non-commutative groups from a partial set of Fourier coefficients.

They are lots of interesting things, however further down, one can read the following:
The sparsest recovery approach of this paper is similar (in flavor) to the above stated work. However, the methods or approaches of the prior work do not apply. In a nutshell, in our setup too, we are interested in recovering a certain sparse vector x from data y = Ax. However, the corresponding matrix A is given rather than a design choice. Moreover, the matrix A is dependent on the structure of the space of permutations. An important development of the above stated work is the characterization of a class of sufficient conditions (on the structure of A) for recovery, collectively known as the Restricted Isoperimetric Property (RIP) (see [CRT06b, BGI+08]) of matrix A. However, such sufficient conditions trivially fail in our setup (see [JS08]). Therefore, a new approach is required.

Does null space property also fail in their set-up ? I guess that takes us back to the discussion we had last week.

And finally from the Rice Compressive Sensing repository, we have:

Exact signal recovery from sparsely corrupted measurements through the pursuit of justice by Jason Laska, Mark Davenport, and Richard Baraniuk. The abstract reads:
Compressive sensing provides a framework for recoveringsparse signals of length N from M \lt\lt N measurements. If the measurements contain noise bounded by \epsilon, then standard algorithms recover sparse signals with error at most C \epsilon However, these algorithms perform suboptimally when the measurement noise is also sparse. This can occur in practice due to shot noise, malfunctioning hardware, transmission errors, or narrowband interference. We demonstrate that a simple algorithm, which we dub Justice Pursuit (JP), can achieve exact recovery from measurements corrupted with sparse noise. The algorithm handles unbounded errors, has no input parameters, and is easily implemented via standard recovery techniques.

Friday, October 16, 2009

CS: Food for thoughts for the week-end.

Before all of you go home for the week-end, let me add one more thing to the entry of yesterday and then some. Even though I suck, I don't think I am the only one who is going to save that entry, print it and parse it over the week-end. Looks like Giuseppe Paleologo is going to do the same but I am sure we won't be the only ones. Giuseppe added the following in the comment section of yesterday's entry:

I have to say that it's quite amazing that a tweet I shot to Igor cascaded in two detailed replies from none other than Jared Tanner and Remi Gribonval. Thanks to Jared for the nice geometric intuition provided for the NSP. I still have to digest his comment, and his recent paper with Blanchard and Cartis.

Briefly, I should mention that the question of RIP vs NSP is interesting to me because in statistical applications where A is observational, RIP is not obviously satisfied. However, even when it is not satisfied, l1 minimization or lasso perform well, so possibly there is a different explanation for their success. In this respect, I am especially interested in stable recovery under the two assumptions.

I have been reading some papers and presentations by Remi, and have been thinking about his statement: "however the RIP seems necessary to guarantee robustness to noise" (which appears in the recent IEEE paper). In most papers this seems to be the case, because stability depends obviously on the metric properties of the encoder, which RIP captures. The (lone?) exception is a recent paper of Yin Zhang (http://www.caam.rice.edu/~zhang/reports/tr0811_revised.pdf), which characterizes stability not using RIP or the canonical NSP, but still in terms of projections on the null space. The quantity of interest in that analysis is the value of (||x||_1/||x||_2) for x \in N(A), which is different from NSP......
thanks Giuseppe !

By the way, don't miss Giuseppe's very interesting entry (and the comments) on being an Intern at Enron. As a person who was nearby Houston at that time, it really looked like some of the people who worked there, either did not have a clue about the real business of Enron or were fooling themselves in believing that some part of their business was actually making money. I do not know if I was the only one, but I clearly remember something was terribly amiss when Jeffrey Skilling had an interview on the Houston Chronicle saying he wanted to retire to spend more time with his family, that was August 15th 2001.

Returning to RIP vs NSP, I'll add on top of Giuseppe's comment that one of the most important step for an engineer to convince herself/himself that compressive sensing might be a good thing for her/him, revolves around trying to fit their known "measurement" device to the CS setting. An example of that was recently featured in TomoPIV meets Compressed Sensing by Stefania Petra and Christoph Schnörr or in the approach I am suggesting when we 'only' have integro-differential equations for which we have some knowledge about computing eigenfunctions fo said operators. Let us note also the fact that NSP can also include a more specific class of signals: positive signals.

I think, at some point, I am also going to have to write down what some of you have been saying to me about the spectral gap of sparse measurement matrices in the RIP-1 setting and its potential relationship to the eigenfunctions of that operator. In particular my interest is in finding out if there is a relationship between the non-normality of a measurement operator and its potential to be a good/bad expander and how this influence their goodness or badness for doing CS with them. All this is highly speculative.

Unrelated to this, two entries by David Brady got my attention recently:
Finally, let us recall that the focusing using random materials and different combinations of an SLM take about 30 minutes to perform according to Ivo Vellekoop's PhD thesis. Could any of the solvers used in CS reconstruction enable a faster focusing ? I bet they would.



While we are on the subject of random lens cameras, I wanted to talk about this a while back, but I have not had the time to do so, so I am leaving it as a food for thought for the week-end. The good folks at MIT (the Photonics Bandgap Fibers and Devices Group under the leadership of Yoel Fink) have devised a camera based on fiber optics. The interesting technology has pieces of metal inside the fiber optic that allows outside objects to shine directly into the light path of the fiber optic so that it is eventually channeled eventually at some common point down the fiber. In effect, this is a multiplexing operation and one could have this fiber optics pipes woven inside apparels to provide some "random" measurement of the surrounding. Since the cloth follows the movement of the body, one can only imagine how the work in signal manifolds would fit in. Some more information on the MIT technology can be found here and in the reference below.

As you all know, I am very much interested in developing a dirt cheap Compressive Sensing application for the purpose of making CS more "obvious" to the average tinkerer/maker. Hence, I wonder how expensive or inexpensive one could fabricate something equivalent to these fibers ? anybody know ? maybe randomly scratching normal fibers to let light come in, that doesn't sound too bad of an idea. The problem being that one wants to avoid breaking these fibers.

Finally, the stunning (in detail) first photo of this entry was taken from the international space station by one of the astronauts. Could we have a different lens mount and perform the equivalent of a random lens imaging experiment with them ? If you are interested we need to talk about an opportunity like this with Nanoracks (you won't see a mention of it on the web yet).


Reference: Abouraddy, A.F., Shapira, O., Bayindir, M., Arnold, J., Sorin, F., Saygin-Hinczewski, D., Joannopoulos, J.D., Fink, Y., "Large-scale optical-field measurements with geometric fibre constructs", Nature Materials 5, 532-536 (2006). [Full Text (pdf)]

Credit:
1 st/2nd Photo: NASA/Bill Ingalls, photo taken from the international space station, 300 km above the scene of interest (the landing of the Soyuz spacecraft in Kazakhstan). Photo taken with one of these instruments on board.
3rd Photo: NASA/JPL/University of Arizona, USGS Dune Database Entry (ESP_014426_2070), Photo taken by the Hirise camera of some of Mars' dunes. (via the Bad Astronomy blog).

Thursday, October 15, 2009

CS: RIP vs NSP, Clarifications by Jared Tanner and Remi Gribonval

Yesterday's post got some nice feedback. Two things:
  • Thank you to Jared Tanner and Remi Gribonval for being kind enough to take some time off to answer yesterday's question.
  • I suck as I have covered some of these issues/papers before but cannot seem to have a good grasp on this issue.
Here are these answers:

Jared Tanner was the first one to respond to yesterday's post:

Dear Igor,

I just read your Oct. 14th posting on Nuit Blanche where the following question was raised by Giuseppe Paleologo:

"do you know how much weaker is the nullspace property vs. restricted isometry? Are there studies on this?"

and your comment that "I am sure a specialist could clear that up for all of us by sending me a short E-mail." Here are a few take away "bullets", followed by a longer discussion.

a) The RIP is a generic tool that can be used for many algorithms, whenever sparsity is the simplicity being exploited.

b) The NSP is only applicable to sparse approximation questions where there is an "objective", such as l1 minimization.

c) For l1 minimization, the NSP is a weaker assumption than the RIP, see "Compressed Sensing: How sharp is the RIP?" by Blanchard, Cartis, and Tanner.

d) l1 minimization is the only setting where NSP and the RIP have been used to try to answer the same question, and is the only setting where it is fair to compare them. NSP is equivalent to l1 minimization recovery, and for this reason RIP implies the NSP, but not the other way around.

e) Both methods can be used to imply stability.

f) Many matrix ensembles are proven to have bounded RIP (Gaussian, Fourier, etc...). Many matrix ensembles are proven to have the NSP, see " Counting the faces of randomly-projected hypercubes and orthants, with applications" by Donoho and Tanner, to appear in Discrete and Computational Geometry.

Now for the longer discussion:

First of all, what is the NSP? Lets focus on the case of \min \|x\|_1 subject to Ax=b where there is an x_0 that is k-sparse with Ax_0=b and A is of size n by N. (Note the ordering k\lt n \lt N.) min l1 recovers x_0 if and only if thereis no vector \nu in the null space of A such that x_0+\nu is interior to (or intersects the boundary of) the l1 ball defined by \|x_0\|_1. (See image above depicting this, with the blue object being the l1 ball, the yellow circle being x_0 and the red line being the null space of A.) The NSP asks if this occurs for any x_0 that is k sparse, effectively moving x_0 to each of the 2^N (N \choose k) k-faces of the l1 ball. That the null space of A shifted to k-faces of the l1 ball never goes interior probably seems a very strict requirement, but it often holds. In fact, this notion isn't new, it is referred to as "neighborliness" in the convex polytope community. David L. Donoho analyzed this question in detail in 2005, precisely characterizing the values of (k,n,N) when this does and does not hold, see "Neighborly Polytopes and Sparse Solutions of Underdetermined Linear Equations."

How does this compare with RIP? First of all, RIP has nothing to do with l1 minimization, but is clearly a natural property to desire in sparse approximation. This is both the strength and weakness of the RIP. It has been used to prove convergence results for many algorithms, including many for which the null space property would give no information. However, because it is a general tool, the RIP based conditions are generally very restrictive. There is only one venue where it is appropriate to directly compare the null space property and the RIP, this is the recovery results for l1 minimization where both can be used. Not surprisingly, the null space results derived by Donoho are much weaker than associated RIP based results.

This raises a very related point, how does one read RIP based results? When is it true that RIP constants of order 2k are less then 0.45? (See Foucart and Lai, ACHA 2009, pages 395-407 for this l1 recovery condition.) How much more restrictive is it to require RIP constants of order 3k to be less than say 0.2? (See Subspace Pursuit by Dai and Milenkovic.) Some answer this question by saying that matrix class zzz has bounded RIP constants so this is satisfied when n \gt O(k\log(N/k)). Unfortunately, this type of answer does not allow one to distinguish between the above two conditions, and essentially says that all state of the art CS algorithms behave at the optimal order. Although true, this might not be helpful to a practitioner who wants to know what is the best algorithm for them, and how large n must be. To shed some light on this, Jeffrey B. Blanchard, Coralia Cartis, and myself derived improved bounds on the RIP constants for the Gaussian ensemble, and showed how these bounds can be used to make more transparent when the above mentioned conditions are satisfied; allowing one to take the values k,n,N and easily verify if the bound would be satisfied. (Note these bounds require large problem sizes to be effective.) Andrew Thompson joined us to repeat this process for many of the greedy algorithms, see "Phase transitions for greedy sparse approximation algorithms" at: http://www.maths.ed.ac.uk/~tanner/publist.html

Lastly, the NSP and RIP prove a "strong equivalence", that the matrix can be used for compressed sensing of any k sparse vector. There is a less restrictive version of the NSP, which proves what Donoho termed "weak equivalence". Weak equivalence ensures that the matrix can be used for compressed sensing for all but a vanishingly small fraction of k sparse vectors. Note, this is what one observes in practice when
selecting a k sparse vector at random and tests the an algorithms performance. (For a discussion of weak equivalence see "Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing" by Donoho and Tanner.) Efforts are underway to derive weak versions of the RIP, see "Construction of a Large Class of Deterministic Sensing Matrices".

Hope that answers a few questions....

Then, Remi Gribonval sent the following:

Dear Igor,

Regarding the RIP vs NSP discussion, I think our recent IEEE Trans. Inf. Theory paper with Mike Davies pretty much shows how sharper the NSP is wrt the RIP. The paper can be found here and our preprint was discussed on a previous post on Nuit Blanche. We also have a conference paper at SAMPTA09 which considers stable versions of the NSP, seemingly identical to the "truncated NSP" of Wotao Yin and Yilun Wang mentioned in a recent post. This paper is available here.

One way to summarize the results is the following
  • Any matrix satisfying the RIP with delta_2m \lt 0.414 (slightly better constants are known) must satisfy the NSP of order m, and L1 minimization therefore recovers all m-sparse vectors.
  • There exists matrices with RIP delta_2m arbitrarily close to 1/sqrt(2) which fail to satisfy the NSP of order m, and for which L1 will therefore fail to recover some m-sparse vectors
  • Yet, there are matrices with RIP arbitrarily close to 1 which satisfy the NSP of order m, and where L1 is therefore successful.
With this respect,
  • The RIP is a sufficient condition to guarantee that the NSP (or, even better, its stable version) is satisfied. The RIP (implicitly, with a small RIP constant) is not necessary to guarantee the recovery of sparse vectors (and the stable approximate recovery of compressible vectors).
  • however the RIP seems necessary to guarantee robustness to noise
I guess that similar results would hold for the RIP-1 property and other sufficient conditions which differ from the NSP which is a necessary and sufficient condition. Another line of thought is that of Jared Tanner and David Donoho, where they consider "weak recovery thresholds" for specific families of random matrices, that is to say they allow an exponentially small fraction of m-sparse vectors that may not be recovered. Then, the NSP no longer characterizes the threshold, but the RIP is even less accurate.

I hope that this contributes to clarifying these questions....

@article{Davies:2008ab,
Author = {Davies, M. E. and Gribonval, R{\'e}mi},
Journal = {IEEE Trans. Inform. Theory},
Month = may,
Number = {5},
Pages = {2203--2214},
Title = {Restricted Isometry Constants where $\ell^p$ sparse recovery can fail for $0 <>
Volume = {55},
Year = {2009}}

@inproceedings{Davies:2009aa,
Address = {Marseille, France},
Author = {Davies, M. E. and Gribonval, R{\'e}mi},
Booktitle = {Proc. SAMPTA'09 (Sampling Theory and Applications)},
Month = {may},
Title = {On Lp minimisation, instance optimality, and restricted isometry constants for sparse approximation},
Year = {2009}}

Thank you Jared and Remi !