Nuit Blanche: Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients

Monday, April 20, 2015

Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients

Here is a mix of both Stochastic Gradient descent and Random Features to approximate Kernel PCAs and more...

Scale Up Nonlinear Component Analysis with Doubly Stochastic Gradients by Bo Xie, Yingyu Liang, Le Song

Nonlinear component analysis such as kernel Principle Component Analysis (KPCA) and kernel Canonical Correlation Analysis (KCCA) are widely used in machine learning, statistics and data analysis, and they serve as invaluable preprocessing tools for various purposes such as data exploration, dimension reduction and feature extraction.
However, existing algorithms for nonlinear component analysis cannot scale up to millions of data points due to prohibitive computation and memory requirements. There are some recent attempts to scale up kernel version of component analysis using random feature approximations. However, to obtain high quality solutions, the number of required random features can be the same order of magnitude as the number of data points, making such approach not directly applicable to the regime with millions of data points.
We propose a simple, computationally efficient, and memory friendly algorithm based on the "doubly stochastic gradients" to scale up a range of kernel nonlinear component analysis, such as kernel PCA, CCA, SVD and latent variable model estimation. Despite the non-convex nature of these problems, we are able to provide theoretical guarantees that the algorithm converges at the rate $\tilde{O}(1/t)$ to the global optimum, even for the top $k$ eigen subspace. We demonstrate the effectiveness and scalability of our algorithm on large scale synthetic and real world datasets.

Join the CompressiveSensing subreddit or the Google+ Community and post there !