In the paper, they mention as potential uses the Netflix competition, the triangulation from incomplete data and the filling of a covariance matrix known to be of "low rank because the variables only depend upon a comparably smaller number of factors" (.i.e it should apply to many instances of machine learning). This utilization of the nuclear norm is not new, (see CS is not just Compressed Sampling nor Compressed Sensing) however it is different, how different ?We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys

for some positive numerical constant C, then with very high probability, most n x n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.

Building on the concept of restricted isometry introduced in [12] in the context of sparse signal recovery, [27] establishes the first sufficient conditions for which the nuclear norm heuristic returns the minimum rank element in the constraint set. They prove that the heuristic succeeds with large probability whenever the number m of available measurements is greater than a constant times 2nr log n for n x n matrices. Although this is an interesting result, a serious impediment to this approach is that one needs to essentially measure random projections of the unknown data matrix|a situation which unfortunately does not commonly arise in practice. Further, the measurements in (1.15) give some information about all the entries of M whereas in our problem, information about most of the entries is simply not available. In particular, the results and techniques introduced in [27] do not begin to address the matrix completion problem of interest to us in this paper. As a consequence, our methods are completely different; for example, they do not rely on any notions of restricted isometry. Instead, as we discuss below, we prove the existence of a Lagrange multiplier for the optimization (1.5) that certifies the unique optimal solution is precisely the matrix that we wish to recover.

Offline, Emmanuel tells me he has not tried to reconstruct the actual matrix of the Netflix challenge, maybe if you read this, you may have shot at becoming a millionaire over the week-end. Good luck.

References:

[1] SDPT

^{3}- a MATLAB software package for semidefinite-quadratic-linear programming K. C. Toh, R. H. Tütüncü, and M. J. Todd.

[2] Benjamin Recht, Maryam Fazel, and Pablo Parrilo, Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization

Credit: NASA / JPL / University of Arizona, Could it be ice just underneath the Phoenix Mars lander ?

Regarding the estimation of the restricted isometry constant, I do not understand how using sparse PCA can find a useful bound on RIP. Sparse PCA, if we could solve it, which we can't because it is probably NP hard, would evaluate a tight bound on the largest eigenvalue of submatrixes of a matrix. Call this bound s say.

We then have the bound (for x sparse)

(1-d)||x||_2^2

<=||Px||_2^2 <=s||x||_2^2 <=(1+d)||x||_2^2 Note that because the estimate of s is tight, i.e. by definition, there exist a sparse x, such that ||Px||_2^2=s||x||_2^2, the definition of RIP only guarantees that 1+d>=s, but this inequality does not have to be tight. (In the definition of RIP, either the lower or the upper bound is tight, but not necessarily both!) Therefore, s does not seem to give any useful bound on d.

Let me give a counter example, in the discussion, we have not placed any restrictions on the matrix P. In fact, it is easy to generate a matrix (call it D) for which s<=1. This clearly gives a useless bound s, which implies only that d>0. To generate D, take any matrix P and set D=P/||P||_2. With this normalisation, we have a matrix that satisfies

||Dx||_2^2<=1, for all x and therefore, also for all sparse x. Furthermore, assume that P is such that for all sparse x, P has RIP constatn d, then for all sparse x, (1-d)/||x||_2^2 ||x||_2^2 <= ||Dx||_2^2<=1 D has s=1, but RIP constant 1-((1-d)/||x||_2^2).

In addition to my previous post, looking through Alexandre d'Aspremont's paper "Optimal Solutions for Sparse Principal Component Analysis", the problem in the previous post seems to stem from a possible error where on page 17 in arXiv:0707.0705v4, the authors state that, in the above notation, (1+d)<=s, instead of (1+d)>=s, as I think it should be. I could be wrong, however, I would need a strong argument to convince me that sparse PCA gives an upper and not a lower bound on (1+d).

I agree that sparse factors produce lower bounds. However, the dual of the sparse PCA relaxation produces upper bounds. The strong argument you are looking for here is duality.

Alex

Anonymous,

Alex tells me offline that we will see more on the subject by the end of June.

Igor.