Nuit Blanche: 05/01/2008

Friday, May 30, 2008

CS: Exact Matrix Completion by Semidefinite Programming or Who wants to be a millionaire ?

You thought you'd go home tonight for the week-end thinking you had read just about everything exciting about Compressed Sensing. It's a shame because Emmanuel Candès and Benjamin Recht just released Exact matrix completion by semidefinite programming [also here] The abstract reads:

We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys

for some positive numerical constant C, then with very high probability, most n x n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.

In the paper, they mention as potential uses the Netflix competition, the triangulation from incomplete data and the filling of a covariance matrix known to be of "low rank because the variables only depend upon a comparably smaller number of factors" (.i.e it should apply to many instances of machine learning). This utilization of the nuclear norm is not new, (see CS is not just Compressed Sampling nor Compressed Sensing) however it is different, how different ?

Building on the concept of restricted isometry introduced in [12] in the context of sparse signal recovery, [27] establishes the first sufficient conditions for which the nuclear norm heuristic returns the minimum rank element in the constraint set. They prove that the heuristic succeeds with large probability whenever the number m of available measurements is greater than a constant times 2nr log n for n x n matrices. Although this is an interesting result, a serious impediment to this approach is that one needs to essentially measure random projections of the unknown data matrix|a situation which unfortunately does not commonly arise in practice. Further, the measurements in (1.15) give some information about all the entries of M whereas in our problem, information about most of the entries is simply not available. In particular, the results and techniques introduced in [27] do not begin to address the matrix completion problem of interest to us in this paper. As a consequence, our methods are completely different; for example, they do not rely on any notions of restricted isometry. Instead, as we discuss below, we prove the existence of a Lagrange multiplier for the optimization (1.5) that certifies the unique optimal solution is precisely the matrix that we wish to recover.

The reference [27] is number [2] in the reference section below. In light of the discussion mentioned before, it is a good thing that the argument does not rely on an RIP argument and while it may look a little bit removed from Compressed Sensing, I can think of several instances where it may matter a great deal in the compressed sensing framework especially in the context of imaging detectors but I'll dwell on it later. The numerical experiments make use of the SDPT3 software package [1]. The phase transition observed in Compressed Sensing ( see Jared Tanner's presentation entitled Phase Transition Phenomenon in Sparse Approximation) also shows up in this work as well (as expected from [2])

Offline, Emmanuel tells me he has not tried to reconstruct the actual matrix of the Netflix challenge, maybe if you read this, you may have shot at becoming a millionaire over the week-end. Good luck.

References:
[1] SDPT³ - a MATLAB software package for semidefinite-quadratic-linear programming K. C. Toh, R. H. Tütüncü, and M. J. Todd.

[2] Benjamin Recht, Maryam Fazel , and Pablo Parrilo, Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

Credit: NASA / JPL / University of Arizona, Could it be ice just underneath the Phoenix Mars lander ?

Thursday, May 29, 2008

CS: Q&A on the Restricted Isometry Property for Compressed Sensing Recovery.

Since the Restricted Isometry Property of measurement matrices in Compressed Sensing is such a big deal (even though it is just a sufficient condition) and since the Structurally Random Matrices introduced earlier include a wide variety of previously studied measurement matrices, I went on and asked Thong Do the following question:

Are there any results stating that your structurally random matrices have the Restricted Isometry Property in general?

Thong Do kindly responded with the following:

... The issue of RIP of a random subset of rows of an orthonormal matrix (a generalization of Partial Fourier) was studied by M. Rudelson and R. Vershynin in the paper "On sparse reconstruction from Fourier and Gaussian measurements" [Note from Igor: Comment on that paper is here, also useful are the related slide presentations: Lecture 3: Random Fourier matrices and Commentary and Bibliography]. There is RIP for this family of matrices but it is not as good as RIP of Gaussian matrices. In other words, if we use this family of matrices for sensing, it would require O(Klog^4(n)) rather than O(Klog(n)) measurements for the so called uniformly exact recovery (as in the sense guaranteed by RIP). If we only want non-uniformly exact recovery, then Candes showed in his paper "Sparsity and Incoherence in compressive sampling" that it only requires to have O(Klog(n)) measurements. In other words, O(Klog(n)) of Gaussian measurements guarantees uniform exact recovery while the same number of measurements from a random subset of rows of an orthonormal matrix will only guarantee a non-uniform exact recovery. So what are the uniform and nonuniform businesses ? Roughly speaking, it's like to say a Gaussian matrix would work for all signals (with high probability) while a Partial Fourier only work for a GIVEN signal (with high probability).

I'm sure that SRM has RIP as good as a subset of rows of an orthonormal matrix but have not been sure if SRM's is better and how much better...

I also am receiving other questions that are generally not answered straightforwardly in the papers I have read. I wish they were addressed at some basic level. Here are some:

1) if y= phi x where x is the sparse signal and phi the sensing matrix size k*N; k less than N; do the columns of phi need to be normalized to 1 for RIP to imply perfect recovery?

2) In some papers y= Ux = phi psi x is the formulation (x is sparse). In this case the RIP is applicable for U right? not just for phi?

I think these questions can be clarified by reading Emmanuel Candes and Justin Romberg's paper entitled Sparsity and Incoherence in Compressive Sampling.

However, the short answer to the first question is yes some normalization is in order otherwise bounds for some of the inequalities for the concentration of measure will be off. The short answer to the second question is yes. The short answer to the third question is: the reason you think you need phi to have the RIP is because in this case psi is the identity. Eventually only U counts, phi then only needs to be incoherent with psi.

With regards to checking the RIP of a specific matrix, here is another question:

If i want to check the Restricted Isometry property for measurement matrix F, it should satisfy for all S-sparse vectors x, the following relationship,

In the paper of Candes Decoding by Linear Programming,its written that if above condition is satisfied, all eigenvalues of (F*F),where F* is the Hermitian of F,should lie between and .

I have a question regarding eigenvalues of F*F.I have checked there may exist F which satsifies the above Restricted Isometry for all S sparse vectors but still the eigenvalues of F*F may not lie between and ?

then an example is given:

A simple example,length of sparse vector=N=5
number of measured samples=M=4
Sparsity=S=2

Creating Partial fourier matrix(without randomizing the rows) and normalizing its columns
F=dftmtx(N);
F=normalize(F(1:M,:));
I have sparse vector x
x=[0 0 0 0.6820 0.5253];

The RIP condition

is satisfied for delta=0.11()

But if we see the eigen values of matrix F*F are[0.75,1.25] which does not lie between . Hence even if for some sparse vector x,the RIP condition is satisfied but it doesnt imply about the bounds on eigenvalues.

The short answer is: the delta you computed is for one sparse vector, not for all sparse vectors. In turn, the eigenvalues of the matrix still give a lower bound for the RIP constant.

But....but how do you find the RIP constant ? well you need to try all the sparse vectors, this is why this is a combinatorial search. This is why the possibility of finding this RIP constant using Sparse PCA raises some interest. This is also why, if it is indeed NP-hard, it needs to be understood that the RIP is only a sufficient condition, that there are other constructions (Combining Geometry And Combinatorics: A Unified Approach to Sparse Signal Recovery by Radu Berinde, Anna Gilbert, Piotr Indyk, Howard Karloff and Martin Strauss) that are not RIP(2) but RIP(1) yet allow for full recovery. In my opinion, the work of Jared Tanner (Phase Transition Phenomenon in Sparse Approximation) also point to other potential constructions that work.

If any of these answers are wrong, please go ahead and comment.

Credit: NASA/JPL/University of Arizona, Phoenix Lander as seen by the Hirise Camera on-board the orbiting Mars Reconnaissance Observer after landing.

Wednesday, May 28, 2008

CS: Compressed Sensing Big Picture, a Blog entry, Learning Dictionaries While Solving, Several presentations.

Anna Gilbert mentioned to me that the listing of the different measurement / encoding means used in Compressed Sensing as listed in the big picture was, maybe, a little confusing. I was in the middle of reordering that listing following the comments by Piotr Indyk so I welcomed additional inputs:

I was reading today's blog entry and thought I should point out Mark's Fourier papers. Mark Iwen has two papers on deterministic subsampling algorithms for sublinear time (i.e., faster than optimization) Fourier recovery. His results handle noisy signals, not just k-sparse ones. The main drawback of the construction is that it scales like k^2 polylog(n), rather than k polylog(n). But, it is sparse and the associated algorithm is fast. His second paper goes through the number theory to show that with prime group tests, you can't get smaller than k^2 polylog(n).

1. M. A. Iwen,
A Deterministic Sub-linear Time Sparse Fourier Algorithm via Non-adaptive Compressed Sensing Methods, ACM-SIAM Symposium on Discrete Algorithms (SODA), San Francisco, CA,
(2008).
2. M. A. Iwen & C. V. Spencer, Improved Bounds for a Deterministic Sublinear-Time Sparse Fourier Algorithm, Conference on Information Sciences and Systems (CISS), Princeton, NJ,
(2008).

These two articles were already there but at the wrong locations. So after some rearranging, Anna further commented:

The classification is getting better although upon further scrutiny, I can see the possibilities for further rearrangements:

non-deterministic, non-adaptive
non-deterministic, adaptive
deterministic
a. Fourier <-> spike bases
b. other

Basically, my point is that some of the deterministic constructions are for measuring in the spike basis and recovering in the spike basis (or measuring using special kerdock code vectors and reconstructing in some other basis) or measuring in spike and recovering in Fourier (and vice versa) and your categorization seems restricted to Fourier and spikes. Fourier and spike are, of course, special bases as they are natural, they're mutually as incoherent as can be, etc. With such a categorization, you can then include Cormode and Muthukrishnan's work on deterministic, combinatorial approach to compressed sensing under other bases and you'll have a more balanced view of the literature.

I need to include the papers of Cormode and Muthukrishnan in the Big Picture measurement section but if like Anna or Piotr you feel strongly about how some instances of what is written in the big picture or on this blog need to be presented in a different fashion then please let me know. I even get to modify the offending lines quicker if there are suggestions on how to make it better.

I am also trying to clean up the section on hardware. It is somehow difficult and I am wondering if I should not use the Technology Readiness Level (TRL) scale to put some order there. For instance, work on MRI is going very fast whereas other type of imaging are still at their infancy. Then again, other technologies, while not doing CS per se, are already doing some type of subsampling that, if twisted a little could be considered CS and could be using nonlinear CS reconstruction techniques.

In other news, Petar Maymounkov has a blog entry on Compressed Sensing or more specifically, on the use of dictionaries that are eventually needed to decode compressed measurements. It led me to one paper by Guillermo Sapiro and Hstau Y. Liao entitled Sparse image representation for limited data tomography. The abstract reads:

In limited data tomography, with applications such as electron microscopy and medical imaging, the scanning views are within an angular range that is often both limited and sparsely sampled. In these situations, standard algorithms produce reconstructions with notorious artifacts. We show in this paper that a sparsity image representation principle, based on learning dictionaries for sparse representations of image patches, leads to significantly improved reconstructions of the unknown density from its limited angle projections. The presentation of the underlying framework is complemented with illustrative results on artificial and real data.

What is interesting in this approach is that the dictionary is learned (using K-SVD) while the nonlinear reconstruction process is taking place. I don't think I have seen that before in CS yet.

Finally, there are several presentations related to Compressed Sensing just sprung up on the web:

Large Scale Approximate Inference for Bayesian Image Reconstruction and Measurement Design by Matthias W. Seeger and,
Compressed Sensing by Wolfgang Dahmen
All the presentations made at Journée représentations parcimonieuses du GDR ISIS are now available. Some are in French and some in English. They are all very interesting. Of the ones I could attend, I have already mentioned this one and this one. Following up on that François Malgouyres makes available his code performing the Basis Pursuit Denoising using one of these proximal solvers (PPA).

Credit: NASA/JPL/University of Arizona, Descent of the Phoenix Lander (PSP_008579_9020), picture taken autonomously by a robot (Mars Reconnaissance Orbiter (MRO) /HIRISE camera) of a robot (Phoenix) as it falls on Mars. wow.

Tuesday, May 27, 2008

CS: Robustness of Compressed Sensing in Sensor Networks

Here is a fresh undergraduate thesis by Brett Hern, a student at Texas A&M University, entitled Robustness of Compressed Sensing in Sensor Networks. The abstract reads:

Compressed sensing is a new theory that is based on the fact that many natural images can be sparsely represented in an orthonormal wavelet basis. This theory holds valuable implications for wireless sensor networks because power and bandwidth are limited resources. Applying the theory of compressed sensing to the sensor network data recovery problem, we describe a measurement scheme by which sensor network data can be compressively sampled and reconstructed. Then we analyze the robustness of this scheme to channel noise and fading coefficient estimation error. We demonstrate empirically that compressed sensing can produce significant gains for sensor network data recovery in both ideal and noisy environments.

It can be downloaded here.

Image credit: NASA/JPL-Caltech/University of Arizona/Texas A&M, Martian Polar Plains taken by the Phoenix's Surface Stereo Imager.

Monday, May 26, 2008

CS: Feedback from Last week's postings: RIP Constant, Geometric vs Combinatorial App, Structurally Random Matrices, Large Scale Open Quantum Systems.

Woohoo. First of all, Congratulations to the Phoenix team ( and to Mark in particular) for landing a robotic mission on Mars yesterday. At first sight, the landscape is eerily similar to that on Earth at the same latitude. I am sure we'll have many more surprises....

Usually I don't do this but since the blogger system is awkward with regards to posting comments and since I feel these comments bring additional understanding on the subject at hand (against me sometimes), I would rather that they be more prominently displayed in one entry. So here we go.

After reading this entry CS: New CS Measurement Matrices, Some blog entries, Failing Einstein and Coded Apertures, Thong Do, the author mentioned here noted the following in the comment section:

It's very interesting that you mention 2 new CS measurement matrices. Actually, these two matrices, Random Convolution and Random Kronecker Products, are very similar to our recent work of Structurally Random Matrices(SRM). As far as I know, SRM is a conceptualization of both Scrambled FFT and Random Convolution. In particular, Scambled FFT is a special case of SRM with global randomizer and Random Convolution is a special case of SRM with local randomizer. SRM implies that we can change FFT by any fast (dense) transform without negatively influencing the quality of reconstruction. We pay some theoretical price if using a sparse transform such as a block diagonal transform although the degradation is very very small in simulation when global randomizer is used. In addition, when compressively sampling smooth signals like images, we can take advantage of the existence of a sparsifying matrix (even without knowing it at the encoder side) to eliminate the F operator (FFT) of Random Convolution. It means that we only need perform the operator F*D to the images and then randomly downsample the coefficients. Algorithmically, it is equivalent to do a random sign reversal of image pixels, applying some fast transform (e.g. FFT) and then random downsample the coefficients. If you look into the code of SRM (available at http://thanglong.ece.jhu.edu/CS/, you can see that we also used an approach of Random Kronecker product to speed up the software. It is also possible to show in theory that using that trick generates some degradation of performance, i.e. requiring a larger number (log(n)) of measurements than a completely random (full) matrix(e.g. full Gaussian) does.

I'm really excited to see these techniques also used at some other groups....

To what I responded:

Thong, you may have noticed I put your construction at the top of the measurement section of the big picture (http://igorcarron.googlepages.com/cs#measurement).

The entry CS: Restricted Isometry Property, What Is It Good For ? brought forth two different comments.

Initially one point was brought up by Anonymous about the statement of the RIP constant that could be found with the Sparse PCA technique. Here it is:

Anonymous said...: Regarding the estimation of the restricted isometry constant, I do not understand how using sparse PCA can find a useful bound on RIP. Sparse PCA, if we could solve it, which we can't because it is probably NP hard, would evaluate a tight bound on the largest eigenvalue of submatrixes of a matrix. Call this bound s say.

We then have the bound (for x sparse)

(1-d)||x||_2^2
<=||Px||_2^2 <=s||x||_2^2 <=(1+d)||x||_2^2 Note that because the estimate of s is tight, i.e. by definition, there exist a sparse x, such that ||Px||_2^2=s||x||_2^2, the definition of RIP only guarantees that 1+d>=s, but this inequality does not have to be tight. (In the definition of RIP, either the lower or the upper bound is tight, but not necessarily both!) Therefore, s does not seem to give any useful bound on d.

Let me give a counter example, in the discussion, we have not placed any restrictions on the matrix P. In fact, it is easy to generate a matrix (call it D) for which s<=1. This clearly gives a useless bound s, which implies only that d>0. To generate D, take any matrix P and set D=P/||P||_2. With this normalisation, we have a matrix that satisfies

||Dx||_2^2<=1, for all x and therefore, also for all sparse x. Furthermore, assume that P is such that for all sparse x, P has RIP constatn d, then for all sparse x, (1-d)/||x||_2^2 ||x||_2^2 <= ||Dx||_2^2<=1 D has s=1, but RIP constant 1-((1-d)/||x||_2^2).
Anonymous went on further by saying:: In addition to my previous post, looking through Alexandre d'Aspremont's paper "Optimal Solutions for Sparse Principal Component Analysis", the problem in the previous post seems to stem from a possible error where on page 17 in arXiv:0707.0705v4, the authors state that, in the above notation, (1+d)<=s, instead of (1+d)>=s, as I think it should be. I could be wrong, however, I would need a strong argument to convince me that sparse PCA gives an upper and not a lower bound on (1+d).
Alexandre D'Aspremont replied: I agree that sparse factors produce lower bounds. However, the dual of the sparse PCA relaxation produces upper bounds. The strong argument you are looking for here is duality.
Alex
I then eventually added:: Anonymous,

Alex tells me offline that we will see more on the subject by the end of June.

Igor.

The point of the entry ( CS: Restricted Isometry Property, What Is It Good For ? ) was really centered on the potential restrictiveness of the Restricted Isometry Property, and so while finding the RIP constant is an issue, the bigger picture tells you that you have to find some type of property that would be followed by all kinds of measurements matrices, not just some. And so the next point brought forth by Piotr Indyk is more potent as it pertains to my misrepresentation of the type of matrices that follow some type of properties.

Piotr Indyk noted the following:

: Thanks for surveying Anna's talk. One comment: the paper, especially the notion of RIP1, provides a connection between *combinatorial approaches* (using sparse matrices and greedy recovery) and *geometric approaches* (using dense matrices and L1 minimization). The deterministic vs non-deterministic/randomized dichotomy is not as relevant here, since most of the combinatorial algorithms known in the sketching/streaming literature are actually randomized (e.g., CCFC'02, CM'04, CM'06 in the summary slide).

Still, it is true that most the of deterministic (or explicit) constructions of measurement matrices yield sparse matrices. One plausible explanation for this phenomenon is that one can build measurement matrices using expander graphs (which are sparse), and explicit construction of expander graphs is a major research direction in Theoretical Computer Science. If you only got a hammer, everything looks like a nail :)

To what I responded:

Thanks for steering me in the right path. Yes it is an issue between geometric and combinatorial constructions and they so happen to be different because the TCS folks have tools dedicated to building sparse tools. I'll try to change the entry to reflect this. Finding an RIP(1) dense matrix or an RIP(2) sparse matrix would indeed be interesting.
Piotr Indyk responded with the following:: Finding an RIP(1) dense matrix or an RIP(2) sparse matrix would indeed be interesting.

Yes, it would. Personally though, I suspect that it might be impossible to get sparse AND efficient RIP(2) matrices. Specifically, I suspect that the results of [Chandar'08] can be extended to show that any sparse matrix satisfying RIP(2) must consist of many measurements (right now his proof holds only for 0-1 entries). If so, then one would not be able to get a matrix which is efficient, sparse and satisfying RIP(2).

Note that the construction of RIP(2) matrix by DeVore gives a sparse matrix, but the number of measurements in quadratic in the sparsity parameter.

Eventually the issue is that the only deterministic subsampling known to work is that of Meyer and Mattei and I need to make it obvious that the two larger groups that are known to date seem to fall on either side of the combinatorial or geometric constructions rather than on the deterministic and non-deterministic side.

In a different area, John Sidles one of the author of Practical recipes for the model order reduction, dynamical simulation, and compressive sampling of large-scale open quantum systems mentioned here says the following:

... Our UW QSE Group definitely would like to hear from researchers who are interested in extending CS methods to quantum state-spaces.

The passage that Igor quotes was originally intended to be just one or two paragraphs long, comprising a review of Candes and Tao's recent work, together with some handwaving arguments about how CS methods might be relevant to large-scale quantum simulations.

But then we couldn't resist the temptation to try some sparse quantum reconstructions ... which worked far better than we thought they would ... and so then we had to understand why they worked so well ... and the next thing we knew we were wading deep into the CS literature.

Although we presently have a great many CS-related questions, there is not CS topic about which we have more questions than the construction of ultra-large sampling matrices.

For quantum applications, randomly generated sampling matrices are computationally infeasible. Thus not only do the sampling matrices have to be constructable, they have to be "factorable" ... this property is for quantum simulation purposes even more important than a strict RIP property.

We would be very interested and appreciative of opportunities to collaborate with other CS researchers ... especially we are keenly aware that our present "nuts and bolts" approach to quantum CS needs to be better balanced with formal rigor....
To that I responded with:: John:

As far as I can tell the subject of the construction of the right measurement matrix is really still an open subject (I am going to write a small entry on it later this week) as regards to the type of property they should follow in order to be suitable for CS. For instance, Anna Gilbert et al (see the presentation of the l1 meeting) talks about the combinatorial approach to the measurement matrix (they exist in the field of sketching). In that case, the matrices are sparse and as such that they do not bother building them, since the actual measurement process would take a higher number of operations if it were to be implemented as a matrix.

When you state factorable, do you mean that the operators would ideally be implemented as tensor product of 1-d measurement matrices ?

With regards to your statement:
"...But then we couldn't resist the temptation to try some sparse quantum reconstructions ... which worked far better than we thought they would ... and so then we had to understand why they worked so well ... and the next thing we knew we were wading deep into the CS literature...."

I did not see much of an implementation of CS in the paper, can you explain to me those sparse quantum reconstructions ?

Thanks,

Igor.

To which John responded by :

: ....This sparse reconstructions we implemented are pretty simple-minded.

We have a quantum state, generated by our simulations, that is specified by (say 2048) coefficients in a Hilbert basis.

We have the idea that this state is compressible, in the sense that it is near (in Hilbert space) to a low-dimension Kahlerian algebraic variety.

That being true, how many Hilbert components do we need to know, in order to estimate the entire state?

Surely not *all* the Hilbert components. In fact (as CS folks will expect) it suffices to know only a (relatively small) number of (random) projections.

The main new element is that we are projecting not onto linear subspaces, but onto the embedded Kahlerian varieties (aside: these embedded varieties are given various names by various communities, e.g., they are known as matrix product states to condensed matter physicists ... but everyone is talking about the same mathematical objects).

As for *why* these Kahlerian varieties are the natural subspace ... well that's the physics part of the review article.

To carry this program through in *really* large dimensions, we can't use random projections ... they are too costly to compute. Hence our interest in deterministic sampling matrices, which we construct geometrically via a theorem that relates pairwise orthogonality to pairwise Hamming distance in a dictionary of atoms.

We would like to be able to prove theorems about the RIP properties of these Hamming isometry constructions ... but we are at present not smart enough to do this ... so for now we determine these properties by exhaustive computations.

A picture of an 8x16 (complex) sampling matrix constructed by these Hamming isometry methods is here (this matrix is also discussed in the manuscript).

What frustrates us quantum system engineers about the RIP literature is that many properties are proved asymptotically about Gaussian sampling matrices, but if you *try* these random Gaussian constructions in small dimensions, they perform rather poorly ... much worse than Hamming isometry constructions.

As a concrete challenge, what 8x16 sampling matrix has the best RIP properties? Can anyone do better than the RIP properties of the above Hamming isometry construction?
and also.: As a followup to the preceding post, here is a link to a Mathematica file that specifies the full 8x16 Hamming-2 isometry matrix -- just to save numerically-minded folks the effort of coding up the spin-1/2 rotation matrices (the review gives the details of the construction).

Then John went on to comment the entry on the kronecker product mentioned by Yin Zhang in this newer entry:

Just to mention, Yin Zhang's very interesting discussion of random Kronecker product matrices also applies to the deterministic Kronecker product matrices that we use in our "Practical recipes for large-scale quantum simulation" (arxiv: 0805.1844).

As our review summarizes, there is a vast quantum physics literature on these Kronecker product objects ... once we appreciate that the quantum physicists insist on calling them by funny names like "matrix product states" and "Hartree-Fock states" and that geometers insist on calling them equally funny names like "Kahlerian algebraic varieties" and "Grassmannians". Everyone is talking about the same mathematical objects, however.

Quantum physicists embrace Kronecker-type objects for the same reason that the CS community embraces them: compact storage, fast calculations, and high-fidelity representations.

The "aha" required to link the CS and quantum physics is quite simple ... quantum states are compressible objects.

In fact, modern quantum information theory even provides a reasonably satisfactory informatic explanation of why quantum states are compressible objects (it's because quantum dynamical systems that are contact with thermal reservoirs evolve into compressible objects ... this can be regarded as the definition of a thermal reservoir).

Also, just to mention, the matrix product state (MPS) community plays tricks with outer products that I don't think have yet been tried in classical signal processing ... there is a finite possibility that these tricks might be broadly useful.

It appears that a golden era of sampling matrices is opening up ... :)

Credit Photo: NASA/JPL- Caltech, University of Arizona. The view from Phoenix after landing yesterday on the North Pole of Mars.

Friday, May 23, 2008

CS: New CS Measurement Matrices, Some blog entries, Failing Einstein and Coded Apertures

Following up on the meeting at Texas A&M on Nonlinear Approximation Techniques Using L1, I noted two new compressed measurement schemes. In Justin Romberg's presentation entitled Architectures for Compressive Sampling one could note a new scheme called Random Convolution.

That scheme is efficiently implemented in a new camera system he is implementing whereby elements between lenses will change the phase of the incoming light.

In Enhanced Compressive Sensing and More, Yin Zhang wants to avoid the issue of storage of large random matrices and devises a scheme that reduces tremendously the storage count by using a kronecker/tensor product of two smaller sized random matrices.

I'll be adding those to the list of measurement matrices in the big picture page.

On a different note, James Lee has a new entry of relevance to Compressed Sensing entitled The pseudorandom subspace problem

Mark Iwen will defend his thesis on July 28th. Good luck Mark.

Here is a heartbreaker, after forty nine years in the making, NASA is probably ready to pull the plug on Gravity Probe B after the bad grade it just obtained. After four years of flight, they could not do better than the experiment using the retro-reflectors left on the Moon during the Apollo 11, 14 and 15 missions. Which brings me to another subject that came up during the L1 meeting. Somebody asked me about coded aperture cameras and their use in astronomy. The reasons they are used comes from the fact that photons outside of the visible wavelength (roughly) do not bend when going through matter like lenses. For this reason, people have been using coded aperture camera to figure out where specific event where originating in the sky. For more one can read these entries [1] and [2]. In the NASA review that gave a very bad grade to Gravity Probe B, there are a number of missions that have coded aperture cameras:

1. Swift , launched in 2004, has the
Burst Alert Telescope (BAT), see the mask on the side.

6. WMAP (Wilkinson Microwave Anisotropy Probe), launched in 2001, does have a coded aperture per se but rather a "cruder system" as shown here.

8. Integral (INTErnational Gamma-Ray Astrophysics Laboratory), launched in 2002, has three coded aperture cameras:

The spectrometer (SPI), it has an hexagonal coded aperture mask.

The IBIS Imager and the Joint European X-Ray Monitor (JEM-X) (mask is here)

Laurent Duval mentioned to me a call for paper on Wavelet Applications in Industrial Processing VI for the SPIE meeting in January meeting in San Diego. The important dates are:

Abstract (500 words) Due Date: 16 June 2008
Final Summary (200 words) Due Date: 17 November 2008
Manuscript Due Date: 22 December 2008

[1] Compressed Sensing: Thoughts on Coded Aperture and Compressed Sensing
[2] Compressed Sensing: Compressive Coded Aperture Superresolution Image Reconstruction, TOMBO, CASSI

Thursday, May 22, 2008

CS: Restricted Isometry Property, What Is It Good For ?

Following the meeting at Texas A&M on Nonlinear Approximation Techniques Using L1, I have been trying to make sense of several presentations. If I have misinterpreted or misunderstood some aspect of the results presented, please let me know.

A matrix A is said to have the Restricted Isometry Property if for any k-sparse vector, the following expression is verified:

The story goes that if A has this property, one can recover the sparse solution x of the underdetermined system A x = b using linear programming ( l_1) instead of a combinatorial search ( l_0). Hence, it is somewhat an important tool to guide engineers trying to implement compressed sensing on hardware. Take for instance the discussion in slide 16 to 22 of Roummel Marcia and Rebecca Willett's coded aperture architecture in Compressive Coded Aperture Superresolution Image Reconstruction (the slides are here).

Recently, Alexandre D'Aspremont pointed out that "... sparse PCA produces upper bounds on the restricted isometry constants ...", and therefore provided a potential elegant solution that showed that a specific matrix can follow the Restricted Isometry Property as opposed to the much less entertaining solution mentioned by Terry Tao a year ago:

An alternate approach, and one of interest on its own right, is to work on improving the time it takes to verify that a given matrix (possibly one of a special form) obeys the UUP. The brute-force approach of checking the singular values of every set of S column vectors requires a run time comparable to $\binom{n}{S}$ or worse, which is quite poor.

While convenient, this RIP property does not seem to be a very good tool to ascertain the usefulness of a matrix for Compressed Sensing (i.e. use "rapid" reconstruction techniques such as linear programming reconstruction codes). I have already mentioned something along these lines earlier ( CS: The KGG explanation of l0 and l1 correspondence without RIP, l_p minimization without a reference to RIP) but five presentations at the L1 conference seem to push the issue a little further.... I think.

Jared Tanner in Phase Transition Phenomenon in Sparse Approximation presented a geometrical approach to the correspondence between l_1 and l_0 with the folding of an l_1 ball into a polytope and then showed that Gaussian matrices were not the only

family of measurement matrices that followed a certain empirical law for recovery: Partial Fourier as well as Sparse matrices [1] seem to yield similar bounds for the l_1 to l_0 recovery based on these geometric arguments.

Using empirical evidence, he then shows that the Restricted Isometry Property is too strict of a property for a measurement matrix (red line in the graph below).

In Enhanced Compressive Sensing and More, Yin Zhang asked the question of how one could deal with Compressed Sensing problems using additional prior information other than pure sparsity or compressibility,

and developed a proof based on KGG mentioned earlier (CS: The KGG explanation of l0 and l1 correspondence without RIP, l_p minimization without a reference to RIP) in order to provide a more rapid way of proving the adequacy of Gaussian matrices for the recoverability of solutions when solving the l_p problem (for p less than 1) instead of the l_0 one. This is interesting as there are not that many results for these l_p metrics.

He then showed how one can develops more advanced solvers in order to deal with these more complex situations [one of these solvers looks like ProxIT by the way]

As an aside, I personally liked very much the use of prior information such as a similar signal to produce a quicker solution technique.

Anna Gilbert, on the other hand, was featuring the much awaited paper that bridges the gap between the deterministic and non-deterministic approaches undertaken in Compressed Sensing ( the paper is " Combining Geometry And Combinatorics: A Unified Approach to Sparse Signal Recovery" and was written by Radu Berinde, Anna Gilbert, Piotr Indyk, Howard Karloff and Martin Strauss).

[ In the deterministic setting also known as Sketching, the sketch length corresponds to the number of rows of the measurement matrix. Also, one rarely builds the measurement matrix, it is implemented as an algorithm or a function call. Even if the matrix is actually sparse, it is still very very large. While the asymptotics look good, one of the reason you haven't seen much of the deterministic codes implementation in the big picture page (here or here) is because some of the constants involved in the asymptotics may be very bad. I am sure the deterministic folks can correct me on this paragraph, however as I have said in the past, if the engineers don't have one of these programs in their hands, one will never know if there is a niche market for them. hint hint]

A summary of the deterministic methods and different bounds for CS is given in the following slide:

In order to bridge the gap between the two approaches, the authors defined an RIP based on the l_1 norm and an RIP based on the l_2 norm.

The fascinating part of this presentation was when it was shown that empirically sparse measurement matrices [1] obtained from the RIP(1)/deterministic setting seem to have similar behavior as the dense measurement matrices RIP(2) such as the Gaussian ensemble ! [ for illustration, see the exchange in the comment section of Raghunandan Kainkaryam 's blog entry between Raghunandan and Radu Berinde] .

These empirical results can be found in [1]. Finally, the summary of the paper is very illuminating.

In summary, RIP(1) is not the same as RIP(2), i.e. both properties do not yield the same ensemble of measurement matrices that allow signal reconstruction in CS using Linear Programming. Yet, one discovers that matrices that follow either RIP (1) or RIP(2) are very different in structure and yet seem to follow the same empirical performances. Jared Tanner shows a similar behavior and Yin Zhang bypasses entirely the need for RIP in order to attain results on the recoverability from l_p to l_0. In effect, it looks as though the Restricted Isometry Property is a very restrictive sufficient condition for recovering signals using l_1 instead of l_0. It is likely not a necessary condition. Hence, it may also be too restrictive to guide engineering work. For instance, I did not see it as a guiding principle in the Camera implementation proposed by Justin Romberg in his presentation on Architectures for Compressive Sampling. While the presentation of Gitta Kutyniok entitled "l1-Minimization and the Geometric Separation Problem"avoided the subject by estimating some type of partial coherence within the dictionaries.

Food for thoughts: Since the Restricted Isometry Property is more or less a statement on the eigenvalues of the measurement matrix (actually its 2-norm), what if the restrictiveness of this property could be alleviated by evaluating some of the properties of its eigenvectors without computing them ? For instance one could easily compute the pseudospectrum of these measurement matrices. The underlying thinking is as follows: The pseudospectrum is really a measure of how certain eigenvectors are nearly co-linear to each other. Graphically, the pseudospectrum represents the different level sets of the resolvent of a matrix/operator. If the pseudospectrum has level sets directly proportional to how far the contours are from every eigenvalues, then the operator is normal and all eigenvectors are orthogonal or nearly orthogonal to each other. If on the other hand, the level sets of the norm of the resolvent behave "abnormally" in the neighborhood of certain eigenvalues, then the group of eigenvectors related to these eigenvalues is nearly co-linear to each other. This colinearity property is something we want to avoid in dictionaries so I wonder if it would make sense to draw some conclusion between a statement on the pseudospectrum and the ability for measurement matrices to recover solutions using linear programming. Different methods have been devised to compute the pseudospectra of rectangular matrices [2] and that closer to home, Mark Embree at Rice has worked on this [3]. Let us note the use of random matrices to compute pseudospectra. More information can be found at the Pseudospectra Gateway.

References:

[1] Sparse Recovery Using Sparse Random Matrices by Piotr Indyk and Radu Berinde.
[2]Eigenvalues and Pseudospectra of Rectangular Matrices by Thomas G Wright and Lloyd N Trefethen
[3] Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators, Lloyd N Trefethen and Mark Embree

Tuesday, May 20, 2008

CS: Presentations of the meeting on "Nonlinear Approximation Techniques Using L1" now online.

Woohoo. Thanks to the organization set forth by Jean-Luc Guermond, Bojan Popov and Ron DeVore, the presentations of the meeting on Nonlinear Approximation Techniques Using L1 are now online. Here is the list of papers with the names of the presenters (The videos are being worked on)

I'll come back on some of the presentations later.

Credit: NASA/JPL/University of Arizona. Small rayed impact crater, about 160 meters (530 feet) in diameter, in the Tharsis region in the northern hemisphere of Mars. Image acquired on April 11, 2008.

Monday, May 19, 2008

CS: The KGG explanation of l0 and l1 correspondence without RIP, l_p minimization without a reference to RIP.

During the course of the L1 conference, Cassini was taking this picture and I was made aware of two new interesting papers: Extracting Salient Features From Less Data via l1-Minimization by Wotao Yin and Yin Zhang. The abstract reads:

MRI (magnetic resonance imaging) is a widely used medical imaging modality that creates an image from scanned data that are essentially the Fourier coefficients of this image. A typical abdominal scan may take around 90 minutes. Can we reduce this time to 30 minutes by using one third of the Fourier coefficients, while maintaining image quality? In this article, we hope to convince the reader that such reductions are achievable through a new and promising approach called compressive sensing (or compressed sensing). The main computational engine that drives compressive sensing is l1-related minimization algorithms.

The important part of this paper is located in paragraph "2. When are the l0- and l1-problems equivalent?" where the authors present a short and "new" explanation of the correspondence between l0 and l1 by using the results of Kashin, Garnaev and Gluskin (KGG) that does not rely on RIP.

The results of Chartrand showed that by reducing p to less than 1, one could obtain sparser representation in CS reconstructions. Here is an additional view on the issue: Sparsest solutions of underdetermined linear systems via

-minimization for

. by Simon Foucart, Ming-Jun Lai. The abstract reads:

We present a condition on the matrix of an underdetermined linear system which guarantees that the solution of the system with minimal l_q-quasinorm is also the sparsest one. This generalizes, and sightly improves, a similar result for the l_1-norm. We then introduce a simple numerical scheme to compute solutions with minimal l_q-quasinorm. Finally, we display the results of some experiments which indicate that this l_q-method performs better than all other available methods.

This is an interesting paper because it avoids the Restricted Isometry Constants by rightly noting

Our main theorem is similar in flavor to many previous ones - in fact, its proof is inspired by theirs - except that we avoid Restricted Isometry Constants, as we felt that the non- homogeneity of the Restricted Isometry Property (1) was in conflict with the equivalence of all the linear systems (cA)z = c y, c \el R.

A fact that hints on the adequacy of the Restricted Isometry Property, but we can come back to this later.

Image Credit: NASA/JPL/Space Science Institute, Dione as seen by Cassini on May 17 while the L1 conference was taking place.

CS: Nonlinear Approximation Techniques Using L1 meeting at Texas A&M University

Howdy y'all, I haven't blogged since Thursday as I was enjoying the meeting on Nonlinear Approximation Techniques Using L1 organized by Ron DeVore, Jean-Luc Guermond and Bojan Popov. It was a rich experience as many of the talks and off discussions veered into discussing issues of Compressed Sensing. Jean-Luc and Bojan told us that the talks would be on the meeting's webpage this week. Much kudos goes to them for organizing such a smooth and interesting workshop.

Since I had seen a fair amount of hits on this entry (the IMA videos/ Where is my popcorn), I offered to take videos of the presentations pretty much the day before the meeting started. I unearthed a hard drive video camera my group had purchased to fly on a stratospheric balloon (it eventually was replaced by a normal still camera and yielded these magnificient results) and started recording. I think I missed Tony Chan's presentation to my dismay. The Everio camera is known to have a proprietary video format, so we are currently doing some post-processing to make them readable on the web. Eventually, they will be posted on the meeting page as well. I make no guarantee to the sound nor on the quality of the tapes (lightning, camera movement,...) .

Credit Photo: Flickr, Rudder Fountain at Texas A&M University.

Thursday, May 15, 2008

CS: Compressive Sampling for Large-Scale Open Quantum Systems.

109 pages for an approach to quantum sensing using random projections in Practical recipes for the model order reduction, dynamical simulation, and compressive sampling of large-scale open quantum systems by John Sidles, Joseph Garbini, L. E. Harrell, Alfred Hero, Jon Jacky, Joe Malcomb, A. G. Norman and A. M. Williamson promise us. The abstract reads:

This article presents practical numerical recipes for simulating high-temperature and nonequilibrium quantum spin systems that are continuously measured and controlled. The notion of a \spin system" is broadly conceived, in order to encompass macroscopic test masses as the limiting case of large-j spins. The simulation technique has three stages: first the deliberate introduction of noise into the simulation, then the conversion of that noise into an informatically equivalent continuous measurement and control process, and finally, projection of the trajectory onto a Kahlerian state-space manifold having reduced dimensionality and possessing a Kahler potential of multilinear (i.e., product-sum) functional form. These state-spaces can be regarded as ruled algebraic varieties upon which a projective quantum model order reduction (QMOR) is performed. The Riemannian sectional curvature of ruled Kahlerian varieties is analyzed, and proved to be non-positive upon all sections that contain a rule. It is further shown that the class of ruled Kahlerian state-spaces includes the Slater determinant wave-functions of quantum chemistry as a special case, and that these Slater determinant manifolds have a Fubini-Study metric that is Kahler-Einstein; hence they are solitons under Ricci flow. It is suggested that these negative sectional curvature properties geometrically account for the fidelity, efficiency, and robustness of projective trajectory simulation on ruled Kahlerian state-spaces. Some implications of trajectory compression for geometric quantum mechanics are discussed. The resulting simulation formalism is used to construct a positive P-representation for the thermal density matrix and to derive a quantum limit for force noise and measurement noise in monitoring both macroscopic and microscopic test-masses; this quantum noise limit is shown to be consistent with well established quantum noise limits for linear amplifiers and for monitoring linear dynamical systems. Single-spin detection by magnetic resonance force microscopy (MRFM) is then simulated, and the data statistics are shown to be those of a random telegraph signal with additive white noise, to all orders, in excellent agreement with experimental results. Then a larger-scale spin-dust model is simulated, having no spatial symmetry and no spatial ordering; the high-fidelity projection of numerically computed quantum trajectories onto low-dimensionality Kahler state-space manifolds is demonstrated. Finally, the high-fidelity reconstruction of quantum trajectories from sparse random projections is demonstrated, the onset of Donoho-Stodden breakdown at the Candes Tao sparsity limit is observed, and methods for quantum state optimization by Dantzig selection are given.

Paragraph 4.6 page 79 is the one most focused on CS:

.....We will conclude our survey of spin-dust simulations with some concrete calculations that are motivated by recent advances in the theory of compressive sampling (CS) and sparse reconstruction. It will become apparent that synoptic simulations of quantum trajectories mesh very naturally with CS methods and ideas. To the best of our knowledge, this is the first description of CS methods applied to quantum state-spaces. Our analysis will mainly draw upon the ideas and methods of Donoho [61] and of Candes and Tao [37], and our discussion will assume a basic familiarity with these and similar CS articles [30, 31, 32, 34, 36], especially a recent series of articles and commentaries on the Dantzig selector [17, 29, 38, 64, 79, 137, 167]. Our analysis can alternatively be viewed as an extension to the quantum domain of the approach of Baraniuk, Hegde, and Wakin [11, 47, 194] to manifold learning [188] from sparse random projections.
Our objectives in this section are:

establish that synoptically simulated wave functions 0 are compressible objects in the

sense of Candes and Tao [37],

Establish that high fidelity quantum state reconstruction from sparse random projections

is algorithmically tractable,

Describe how nonlinear GK projection can be described as an embedding within a larger linear state-space of a convex optimization problem, and thereby

Specify algorithms for optimization over quantum states in terms of the Dantzig selector (a linear convex optimization algorithm) of Candes and Tao [37].

At the time of writing, the general field of compressive sensing, sampling and simulation is evolving rapidly"Nowadays, novel exciting results seem to come out at a furious pace, and this testifies to the vitality and intensity of the field "[8] and our overall goal is to provide mathematical recipes by which researchers in quantum sensing, sampling and simulation can participate in this enterprise.
4.6.1. Establishing that quantum states are compressible objects .....