Nuit Blanche: Sparse PCA: Covariance Thresholding and an AMP based approach

Friday, April 11, 2014

Sparse PCA: Covariance Thresholding and an AMP based approach

No implementation yet, but new ways of going about the Sparse PCA problem. I will be adding the following figure to the phase transition figure in the Advanced Matrix Factorization Jungle page for the Sparse PCA paragraph.

As I mentioned in slide 13 of this recent ML Meetup presentation, to the untrained eye, sparse PCA is a solved problem, but the back story is a little bit more complex as related in the first paper:

....Over the last ten years, a signiﬁcant eﬀort has been devoted to developing practical algorithms that outperform diagonal thresholding, see e.g. [MWA05, ZHT06, dEGJL07, dBG08, WTH09]. In particular, d’Aspremont et al. [dEGJL07] developed a promising M-estimator based on a semideﬁnite programming (SDP) relaxation. Amini and Wainwright [AW09] carried out an analysis of this method and proved that, if (i) k ≤ C(β) n/ log p, and (ii) if the SDP solution has rank one, then the SDP relaxation provides a consistent estimator of the support of v. At ﬁrst sight, this appears as a satisfactory solution of the original problem. No procedure can estimate the support of v from less than k log p samples, and the SDP relaxation succeeds in doing it from –at most– a constant factor more samples. This picture was upset by a recent, remarkable result by Krauthgamer, Nadler and Vilenchik [KNV13]....

.... the rest is in the paper: Sparse PCA via Covariance Thresholding by Yash Deshpande and Andrea Montanari,

In sparse principal component analysis we are given noisy observations of a rank-one (or low rank) matrix of dimension n×p and seek to reconstruct it under additional sparsity assumptions. In particular, we assume here that the principal component v has at most k non-zero entries, and study the high-dimensional regime in which p is of the same order as n. In an inﬂuential paper, Johnstone and Lu introduced a simple algorithm that estimates the support of v by the largest entries in the diagonal of the empirical covariance. This method can be shown to succeed with high probability if k ≤ C1 p n/logp, and to fail with high probability if k ≥ C2p n/logp for two constants 0 < C1, C2 < ∞. Despite a considerable amount of work over the last ten years, no practical algorithm exists with provably better support recovery guarantees. Here we analyze a covariance thresholding algorithm that was recently proposed by Krauthgamer, Nadler and Vilenchik. We conﬁrm empirical evidence presented by these authors and rigorously prove that the algorithm succeeds with high probability for k of order √n). Recent conditional lower bounds suggest that it might be impossible to do signiﬁcantly better. Our analysis develops new bounds on the norm of kernel random matrices, in regimes that were not considered before.

Information-theoretically Optimal Sparse PCA by Yash Deshpande and Andrea Montanari,

Sparse Principal Component Analysis (PCA) is a dimensionality reduction technique wherein one seeks a low rank representation of a data matrix with additional sparsity constraints on the obtained representation. We consider two probabilistic formulations of sparse PCA: a spiked Wigner and spiked Wishart (or spiked covariance) model. We analyze an Approximate Message Passing (AMP) algorithm to estimate the underlying signal and show, in the high dimensional limit, that the AMP estimates are information-theoretically optimal. As an immediate corollary, our results demonstrate that the posterior expectation of the underlying signal, which is often intractable to compute, can be obtained using a polynomial-time scheme. Our results also effectively provide a single-letter characterization of the sparse PCA problem.

Join the CompressiveSensing subreddit or the Google+ Community and post there !