Nuit Blanche: Inductive Sparse Subspace Clustering - implementation - and Greedy Feature Selection for Subspace Clustering

Wednesday, March 27, 2013

Inductive Sparse Subspace Clustering - implementation - and Greedy Feature Selection for Subspace Clustering

You probably recall the recent implementation of the Sparse Subspace Clustering Algorithm by Ehsan Elhamifar, Rene Vidal. Well, we have two other competing approaches to that one. The first one by Xi Peng, Lei Zhang, Zhang Yi has an implementation attached to it. For the second paper, one of the author, Aswin, told me that the attendant OMP algorithm would be out soon (stay tuned).

Inductive Sparse Subspace Clustering by Xi Peng, Lei Zhang, Zhang Yi. The abstract reads:

Sparse Subspace Clustering (SSC) has achieved state-of-the-art clustering quality by performing spectral clustering over a $\ell^{1}$-norm based similarity graph. However, SSC is a transductive method which does not handle with the data not used to construct the graph (out-of-sample data). For each new datum, SSC requires solving $n$ optimization problems in O(n) variables for performing the algorithm over the whole data set, where $n$ is the number of data points. Therefore, it is inefficient to apply SSC in fast online clustering and scalable graphing. In this letter, we propose an inductive spectral clustering algorithm, called inductive Sparse Subspace Clustering (iSSC), which makes SSC feasible to cluster out-of-sample data. iSSC adopts the assumption that high-dimensional data actually lie on the low-dimensional manifold such that out-of-sample data could be grouped in the embedding space learned from in-sample data. Experimental results show that iSSC is promising in clustering out-of-sample data.

The MATLAB code of iSSC is here.

The second solver and its theoretical justification is featured in: Greedy Feature Selection for Subspace Clustering by Eva Dyer, Aswin Sankaranarayanan, Richard Baraniuk. The abstract reads:

Unions of subspaces are powerful nonlinear signal models for collections of high-dimensional data. However, existing methods that exploit this structure require that the subspaces the signals of interest occupy be known a priori or be learned from the data directly. In this work, we analyze the performance of greedy feature selection strategies for learning unions of subspaces from an ensemble of data. We develop sufficient conditions that are required for orthogonal matching pursuit (OMP) to recover subsets of points from the ensemble that live in the same subspace, a property we refer to as exact feature selection (EFS). Following this analysis, we provide an empirical study of greedy feature selection strategies for subspace clustering and characterize the gap between sparse recovery methods and nearest neighbor (NN)-based approaches. We demonstrate that the gap between sparse recovery and NN methods is particularly pronounced when the tiling of subspaces in the ensemble is sparse, suggesting that sparse recovery methods can be used in a number of regimes where nearest neighbor approaches fail to reveal the subspace membership of points in the ensemble.

Join the CompressiveSensing subreddit or the Google+ Community and post there !