Nuit Blanche: FastEmbed: Compressive spectral embedding: sidestepping the SVD

Friday, October 02, 2015

FastEmbed: Compressive spectral embedding: sidestepping the SVD - implementation -

From the paper:

"...In this paper, we tackle these scalability bottlenecks by focusing on what embeddings are actually used for: computing ℓ2-based pairwise similarity metrics typically used for supervised or unsupervised learning. For example, K-means clustering uses pairwise Euclidean distances, and SVM-based classification uses pairwise inner products. We therefore ask the following question: “Is it possible to compute an embedding which captures the pairwise euclidean distances between the rows of the spectral embedding E= [f(σ1)u1···f(σk)uk], while sidestepping the computationally expensive partial SVD?” We answer this question in the affirmative by presenting a compressive algorithm which directly computes a low-dimensional embedding..."

Compressive spectral embedding: sidestepping the SVD by Dinesh Ramasamy, Upamanyu Madhow

Spectral embedding based on the Singular Value Decomposition (SVD) is a widely used "preprocessing" step in many learning tasks, typically leading to dimensionality reduction by projecting onto a number of dominant singular vectors and rescaling the coordinate axes (by a predefined function of the singular value). However, the number of such vectors required to capture problem structure grows with problem size, and even partial SVD computation becomes a bottleneck. In this paper, we propose a low-complexity it compressive spectral embedding algorithm, which employs random projections and finite order polynomial expansions to compute approximations to SVD-based embedding. For an m times n matrix with T non-zeros, its time complexity is O((T+m+n)log(m+n)), and the embedding dimension is O(log(m+n)), both of which are independent of the number of singular vectors whose effect we wish to capture. To the best of our knowledge, this is the first work to circumvent this dependence on the number of singular vectors for general SVD-based embeddings. The key to sidestepping the SVD is the observation that, for downstream inference tasks such as clustering and classification, we are only interested in using the resulting embedding to evaluate pairwise similarity metrics derived from the euclidean norm, rather than capturing the effect of the underlying matrix on arbitrary vectors as a partial SVD tries to do. Our numerical results on network datasets demonstrate the efficacy of the proposed method, and motivate further exploration of its application to large-scale inference tasks.

A Python implementation of FastEmbed is available at: https://bitbucket.org/dineshkr/fastembed/src/NIPS2015

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !