Nuit Blanche: Trace: What is it good for ? - How Compressed Sensing relates to Dimensionality Reduction

Wednesday, October 24, 2007

Trace: What is it good for ? - How Compressed Sensing relates to Dimensionality Reduction

In reference [1] by Killian Weinberger and Lawrence Saul, one learns that one dimensionality reduction technique uses the maximization of the trace of the Gram matrix in order to unfold a manifold . From [1]

..We also emphasize that in eq. (9), we are maximizing the trace, not minimizing it. While a standard relaxation to minimizing the rank [9] of a semidefinite matrix is to minimize its trace, the intuition here is just the opposite: we will obtain a low dimensional embedding by maximizing the trace of the Gram matrix...

At the same time, Benjamin Recht, Maryam Fazel , and Pablo Parrilo provide a somewhat different view in [3] (Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimiz ation a paper that draws a parallel between Compressed Sensing using the L1 norm and the nuclear norm for finding low rank approximation of a matrix ) where the Trace of a positive definite matrix (the Gram Matrix) is minimized. The reason is simple [3]:

The nuclear norm is a convex function, can be optimized efficiently, and is the best convex approximation of the rank function over the unit ball of matrices with norm less than one. When the matrix variable is symmetric and positive semidefinite, this heuristic is equivalent to the trace heuristic often used by the control community.....
...
Low-dimensional Euclidean embedding problems
A problem that arises in a variety of fields the determination of configurations of points in low-dimensional Euclidean spaces, subject to some given distance information. In Multi-Dimensional Scaling (MDS) such problems occur in extracting the underlying geometric structure of distance data.

So in one series of paper, we have a minimization and in the other one we have a maximization of the trace of a Gram matrix as an objective function in order to obtain a reduced dimension/low rank approximation of that matrix. This seems odd.

When one goes back to one of the first paper of Killian Weinberger and Lawrence Saul [1], one can read how the heuristic trace argument comes about:

The constraints for local isometry under this neighborhood relation are simply to preserve the lengths of the edges in this graph, as well as the angles between edges at the same node. In practice, it is easier to deal only with constraints on distances, as opposed to angles.....Thus, we must further constrain the optimization to the cone of semidefinite matrices [21]. In sum, there are three types of constraints on the Gram matrix Kij , arising from local isometry, centering, and semidefiniteness. The first two involve linear equality constraints; the last one is not linear, but importantly it is convex. We will exploit this property in what follows.What function of the Gram matrix can we optimize to “unfold” a manifold, as in Fig. 1? As motivation, consider the ends of a piece of string, or the corners of a flag. Any slack in the string serves to decrease the (Euclidean) distance between its two ends; likewise, any furling of the flag serves to bring its corners closer together. More generally, we observe that any “fold” between two points on a manifold serves to decrease the Euclidean distance between the points. This suggests an optimization that we can perform to compute the outputs Yi that unfold a manifold sampled by inputs Xi. In particular,we propose to maximize the sum of pairwise squared distances between outputs....By maximizing eq. (6), we pull the outputs as far apart as possible, subject to the constraints in the previous section. Before expressing this objective function in terms of the Gram matrix Kij , let us verify that it is indeed bounded, meaning that we cannot pull the outputs infinitely far apart. Intuitively, the constraints to preserve local distances (and the assumption that the graph is connected) prevent such a divergence.... Thus, the objective function cannot increase without bound if we enforce the constraints to preserve local distances.

We can express the objective function in eq. (6) directly in terms of the Gram matrix Kij of the outputs Yi. Expanding the terms on the right hand side, and enforcing the constraint that the outputs are centered on the origin, we obtain: T (Y ) =Tr(K). (9)
Thus, we can interpret the objective function for the outputs in several ways: as a sum over pairwise distances in eq. (6), as a measure of variance in eq. (9), or as the trace of their Gram matrix in eq. (9).

In other words, using the max trace argument one finds a lower dimensional manifold by unfolding that manifold subject to distance constraints. In the same way, an argument based on the minimization of the trace could be depicted as squashing that manifold subject to similar distance or angle constraints. Either way, one obtains a reduced version of the manifold : in the MVU case (max trace), some respect is given to the geodesic distance on the embedding whereas in the min case, respect is given to the Euclidian distance. Obviously, in dimension 3, one wonders how a Mobius strip would fare in either situations. This is my interpretation so far, if you have better insight, feel free to enlighten me on this issue. As suggested by a commenter, I just re-read Jon Dattorro's very nice book and it looks like this description of the minimization business is similar to his.( see footnote 4.12, page 241):

Trace (trG = hI , Gi) minimization is a heuristic for rank minimization. (§7.2.2.1) It may be interpreted as squashing G which is bounded below by X^T X as in (512).

References:

[1] Unsupervised learning of image manifolds by semidefinite programming, K.Q. Weinberger and L. K. Saul (2004). (Outstanding student paper award). In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR-04), Washington D.C. pdf