Monday, November 03, 2014

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Again at the intersection between Randomized Numerical Linear Algebra, Advanced Matrix Factorization and some elements of compressive sensing. I also like the fact that the authors plant their efforts within a larger panorama with paragraph 1.1 aptly named "Big Picture" :-) Without further ado:

We show how to approximate a data matrix A with a much smaller sketch A~ that can be used to solve a general class of \emph{constrained k-rank approximation} problems to within (1+ϵ) error. Importantly, this class of problems includes k-means clustering and unconstrained low rank approximation (i.e. principle component analysis). By reducing data points to just O(k) dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems.
For k-means dimensionality reduction, we provide the first (1+ϵ) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate PCA, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for A, but can be used directly to compute this subspace.
Finally, for k-means clustering, we show how to achieve a (9+ϵ) approximation by Johnson-Lindenstrauss projecting data points to just O(logk/ϵ2) dimensions. This gives the first result with dimension independent of input size and sublinear in k. It opens the important question of whether it is possible to obtain a (1+ϵ) approximation using an o(k)-dimensional sketch.
I realize that with the espilon thingy it maybe hard, but I have always wondered about the phase transitions of randomized algorithms
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: