Nuit Blanche: CS: Feedback from Last week's postings: RIP Constant, Geometric vs Combinatorial App, Structurally Random Matrices, Large Scale Open Quantum Systems.

Monday, May 26, 2008

CS: Feedback from Last week's postings: RIP Constant, Geometric vs Combinatorial App, Structurally Random Matrices, Large Scale Open Quantum Systems.

Woohoo. First of all, Congratulations to the Phoenix team ( and to Mark in particular) for landing a robotic mission on Mars yesterday. At first sight, the landscape is eerily similar to that on Earth at the same latitude. I am sure we'll have many more surprises....

Usually I don't do this but since the blogger system is awkward with regards to posting comments and since I feel these comments bring additional understanding on the subject at hand (against me sometimes), I would rather that they be more prominently displayed in one entry. So here we go.

After reading this entry CS: New CS Measurement Matrices, Some blog entries, Failing Einstein and Coded Apertures, Thong Do, the author mentioned here noted the following in the comment section:

It's very interesting that you mention 2 new CS measurement matrices. Actually, these two matrices, Random Convolution and Random Kronecker Products, are very similar to our recent work of Structurally Random Matrices(SRM). As far as I know, SRM is a conceptualization of both Scrambled FFT and Random Convolution. In particular, Scambled FFT is a special case of SRM with global randomizer and Random Convolution is a special case of SRM with local randomizer. SRM implies that we can change FFT by any fast (dense) transform without negatively influencing the quality of reconstruction. We pay some theoretical price if using a sparse transform such as a block diagonal transform although the degradation is very very small in simulation when global randomizer is used. In addition, when compressively sampling smooth signals like images, we can take advantage of the existence of a sparsifying matrix (even without knowing it at the encoder side) to eliminate the F operator (FFT) of Random Convolution. It means that we only need perform the operator F*D to the images and then randomly downsample the coefficients. Algorithmically, it is equivalent to do a random sign reversal of image pixels, applying some fast transform (e.g. FFT) and then random downsample the coefficients. If you look into the code of SRM (available at http://thanglong.ece.jhu.edu/CS/, you can see that we also used an approach of Random Kronecker product to speed up the software. It is also possible to show in theory that using that trick generates some degradation of performance, i.e. requiring a larger number (log(n)) of measurements than a completely random (full) matrix(e.g. full Gaussian) does.

I'm really excited to see these techniques also used at some other groups....

To what I responded:

Thong, you may have noticed I put your construction at the top of the measurement section of the big picture (http://igorcarron.googlepages.com/cs#measurement).

The entry CS: Restricted Isometry Property, What Is It Good For ? brought forth two different comments.

Initially one point was brought up by Anonymous about the statement of the RIP constant that could be found with the Sparse PCA technique. Here it is:

Anonymous said...: Regarding the estimation of the restricted isometry constant, I do not understand how using sparse PCA can find a useful bound on RIP. Sparse PCA, if we could solve it, which we can't because it is probably NP hard, would evaluate a tight bound on the largest eigenvalue of submatrixes of a matrix. Call this bound s say.

We then have the bound (for x sparse)

(1-d)||x||_2^2
<=||Px||_2^2 <=s||x||_2^2 <=(1+d)||x||_2^2 Note that because the estimate of s is tight, i.e. by definition, there exist a sparse x, such that ||Px||_2^2=s||x||_2^2, the definition of RIP only guarantees that 1+d>=s, but this inequality does not have to be tight. (In the definition of RIP, either the lower or the upper bound is tight, but not necessarily both!) Therefore, s does not seem to give any useful bound on d.

Let me give a counter example, in the discussion, we have not placed any restrictions on the matrix P. In fact, it is easy to generate a matrix (call it D) for which s<=1. This clearly gives a useless bound s, which implies only that d>0. To generate D, take any matrix P and set D=P/||P||_2. With this normalisation, we have a matrix that satisfies

||Dx||_2^2<=1, for all x and therefore, also for all sparse x. Furthermore, assume that P is such that for all sparse x, P has RIP constatn d, then for all sparse x, (1-d)/||x||_2^2 ||x||_2^2 <= ||Dx||_2^2<=1 D has s=1, but RIP constant 1-((1-d)/||x||_2^2).
Anonymous went on further by saying:: In addition to my previous post, looking through Alexandre d'Aspremont's paper "Optimal Solutions for Sparse Principal Component Analysis", the problem in the previous post seems to stem from a possible error where on page 17 in arXiv:0707.0705v4, the authors state that, in the above notation, (1+d)<=s, instead of (1+d)>=s, as I think it should be. I could be wrong, however, I would need a strong argument to convince me that sparse PCA gives an upper and not a lower bound on (1+d).
Alexandre D'Aspremont replied: I agree that sparse factors produce lower bounds. However, the dual of the sparse PCA relaxation produces upper bounds. The strong argument you are looking for here is duality.
Alex
I then eventually added:: Anonymous,

Alex tells me offline that we will see more on the subject by the end of June.

Igor.

The point of the entry ( CS: Restricted Isometry Property, What Is It Good For ? ) was really centered on the potential restrictiveness of the Restricted Isometry Property, and so while finding the RIP constant is an issue, the bigger picture tells you that you have to find some type of property that would be followed by all kinds of measurements matrices, not just some. And so the next point brought forth by Piotr Indyk is more potent as it pertains to my misrepresentation of the type of matrices that follow some type of properties.

Piotr Indyk noted the following:

: Thanks for surveying Anna's talk. One comment: the paper, especially the notion of RIP1, provides a connection between *combinatorial approaches* (using sparse matrices and greedy recovery) and *geometric approaches* (using dense matrices and L1 minimization). The deterministic vs non-deterministic/randomized dichotomy is not as relevant here, since most of the combinatorial algorithms known in the sketching/streaming literature are actually randomized (e.g., CCFC'02, CM'04, CM'06 in the summary slide).

Still, it is true that most the of deterministic (or explicit) constructions of measurement matrices yield sparse matrices. One plausible explanation for this phenomenon is that one can build measurement matrices using expander graphs (which are sparse), and explicit construction of expander graphs is a major research direction in Theoretical Computer Science. If you only got a hammer, everything looks like a nail :)

To what I responded:

Thanks for steering me in the right path. Yes it is an issue between geometric and combinatorial constructions and they so happen to be different because the TCS folks have tools dedicated to building sparse tools. I'll try to change the entry to reflect this. Finding an RIP(1) dense matrix or an RIP(2) sparse matrix would indeed be interesting.
Piotr Indyk responded with the following:: Finding an RIP(1) dense matrix or an RIP(2) sparse matrix would indeed be interesting.

Yes, it would. Personally though, I suspect that it might be impossible to get sparse AND efficient RIP(2) matrices. Specifically, I suspect that the results of [Chandar'08] can be extended to show that any sparse matrix satisfying RIP(2) must consist of many measurements (right now his proof holds only for 0-1 entries). If so, then one would not be able to get a matrix which is efficient, sparse and satisfying RIP(2).

Note that the construction of RIP(2) matrix by DeVore gives a sparse matrix, but the number of measurements in quadratic in the sparsity parameter.

Eventually the issue is that the only deterministic subsampling known to work is that of Meyer and Mattei and I need to make it obvious that the two larger groups that are known to date seem to fall on either side of the combinatorial or geometric constructions rather than on the deterministic and non-deterministic side.

In a different area, John Sidles one of the author of Practical recipes for the model order reduction, dynamical simulation, and compressive sampling of large-scale open quantum systems mentioned here says the following:

... Our UW QSE Group definitely would like to hear from researchers who are interested in extending CS methods to quantum state-spaces.

The passage that Igor quotes was originally intended to be just one or two paragraphs long, comprising a review of Candes and Tao's recent work, together with some handwaving arguments about how CS methods might be relevant to large-scale quantum simulations.

But then we couldn't resist the temptation to try some sparse quantum reconstructions ... which worked far better than we thought they would ... and so then we had to understand why they worked so well ... and the next thing we knew we were wading deep into the CS literature.

Although we presently have a great many CS-related questions, there is not CS topic about which we have more questions than the construction of ultra-large sampling matrices.

For quantum applications, randomly generated sampling matrices are computationally infeasible. Thus not only do the sampling matrices have to be constructable, they have to be "factorable" ... this property is for quantum simulation purposes even more important than a strict RIP property.

We would be very interested and appreciative of opportunities to collaborate with other CS researchers ... especially we are keenly aware that our present "nuts and bolts" approach to quantum CS needs to be better balanced with formal rigor....
To that I responded with:: John:

As far as I can tell the subject of the construction of the right measurement matrix is really still an open subject (I am going to write a small entry on it later this week) as regards to the type of property they should follow in order to be suitable for CS. For instance, Anna Gilbert et al (see the presentation of the l1 meeting) talks about the combinatorial approach to the measurement matrix (they exist in the field of sketching). In that case, the matrices are sparse and as such that they do not bother building them, since the actual measurement process would take a higher number of operations if it were to be implemented as a matrix.

When you state factorable, do you mean that the operators would ideally be implemented as tensor product of 1-d measurement matrices ?

With regards to your statement:
"...But then we couldn't resist the temptation to try some sparse quantum reconstructions ... which worked far better than we thought they would ... and so then we had to understand why they worked so well ... and the next thing we knew we were wading deep into the CS literature...."

I did not see much of an implementation of CS in the paper, can you explain to me those sparse quantum reconstructions ?

Thanks,

Igor.

To which John responded by :

: ....This sparse reconstructions we implemented are pretty simple-minded.

We have a quantum state, generated by our simulations, that is specified by (say 2048) coefficients in a Hilbert basis.

We have the idea that this state is compressible, in the sense that it is near (in Hilbert space) to a low-dimension Kahlerian algebraic variety.

That being true, how many Hilbert components do we need to know, in order to estimate the entire state?

Surely not *all* the Hilbert components. In fact (as CS folks will expect) it suffices to know only a (relatively small) number of (random) projections.

The main new element is that we are projecting not onto linear subspaces, but onto the embedded Kahlerian varieties (aside: these embedded varieties are given various names by various communities, e.g., they are known as matrix product states to condensed matter physicists ... but everyone is talking about the same mathematical objects).

As for *why* these Kahlerian varieties are the natural subspace ... well that's the physics part of the review article.

To carry this program through in *really* large dimensions, we can't use random projections ... they are too costly to compute. Hence our interest in deterministic sampling matrices, which we construct geometrically via a theorem that relates pairwise orthogonality to pairwise Hamming distance in a dictionary of atoms.

We would like to be able to prove theorems about the RIP properties of these Hamming isometry constructions ... but we are at present not smart enough to do this ... so for now we determine these properties by exhaustive computations.

A picture of an 8x16 (complex) sampling matrix constructed by these Hamming isometry methods is here (this matrix is also discussed in the manuscript).

What frustrates us quantum system engineers about the RIP literature is that many properties are proved asymptotically about Gaussian sampling matrices, but if you *try* these random Gaussian constructions in small dimensions, they perform rather poorly ... much worse than Hamming isometry constructions.

As a concrete challenge, what 8x16 sampling matrix has the best RIP properties? Can anyone do better than the RIP properties of the above Hamming isometry construction?
and also.: As a followup to the preceding post, here is a link to a Mathematica file that specifies the full 8x16 Hamming-2 isometry matrix -- just to save numerically-minded folks the effort of coding up the spin-1/2 rotation matrices (the review gives the details of the construction).

Then John went on to comment the entry on the kronecker product mentioned by Yin Zhang in this newer entry:

Just to mention, Yin Zhang's very interesting discussion of random Kronecker product matrices also applies to the deterministic Kronecker product matrices that we use in our "Practical recipes for large-scale quantum simulation" (arxiv: 0805.1844).

As our review summarizes, there is a vast quantum physics literature on these Kronecker product objects ... once we appreciate that the quantum physicists insist on calling them by funny names like "matrix product states" and "Hartree-Fock states" and that geometers insist on calling them equally funny names like "Kahlerian algebraic varieties" and "Grassmannians". Everyone is talking about the same mathematical objects, however.

Quantum physicists embrace Kronecker-type objects for the same reason that the CS community embraces them: compact storage, fast calculations, and high-fidelity representations.

The "aha" required to link the CS and quantum physics is quite simple ... quantum states are compressible objects.

In fact, modern quantum information theory even provides a reasonably satisfactory informatic explanation of why quantum states are compressible objects (it's because quantum dynamical systems that are contact with thermal reservoirs evolve into compressible objects ... this can be regarded as the definition of a thermal reservoir).

Also, just to mention, the matrix product state (MPS) community plays tricks with outer products that I don't think have yet been tried in classical signal processing ... there is a finite possibility that these tricks might be broadly useful.

It appears that a golden era of sampling matrices is opening up ... :)

Credit Photo: NASA/JPL- Caltech, University of Arizona. The view from Phoenix after landing yesterday on the North Pole of Mars.

1 comment:

Anonymous said...: By a remarkable coincidence, yesterday I posted about the Phoenix Mars Lander on Scott Aaronson's QIT blog.

That Mars lander is a mighty cool project and a magnificent technical achievement from pretty much any point of view. :); Monday, May 26, 2008 at 2:44:00 PM CDT