Nuit Blanche: Lazy stochastic principal component analysis / Near Optimal Sketching of Low-Rank Tensor Regression

Random projections at work today for evaluating PCA and Tensor Regression !

Lazy stochastic principal component analysis by Michael Wojnowicz, Dinh Nguyen, Li Li, Xuan Zhao

Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.

Near Optimal Sketching of Low-Rank Tensor Regression by Jarvis Haupt, Xingguo Li, David P. Woodruff

We study the least squares regression problem $min Θ \in S ⊙ D, R ∥ A Θ - b ∥ 2,$ where S⊙D,R is the set of Θ for which Θ=∑Rr=1θ(r)1∘⋯∘θ(r)D for vectors θ(r)d∈Rpd for all r∈[R] and d∈[D], and ∘ denotes the outer product of vectors. That is, Θ is a low-dimensional, low-rank tensor. This is motivated by the fact that the number of parameters in Θ is only R⋅∑Dd=1pd, which is significantly smaller than the ∏Dd=1pd number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors Θ, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on {\it sparse} random projections Φ∈Rm×n, with m≪n, to reduce the problem to a much smaller problem minΘ∥ΦAΘ−Φb∥2, for which if Θ′ is a near-optimum to the smaller problem, then it is also a near optimum to the original problem. We obtain significantly smaller dimension and sparsity in Φ than is possible for ordinary least squares regression, and we also provide a number of numerical simulations supporting our theory.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Nuit Blanche

Pages

Tuesday, September 26, 2017

Lazy stochastic principal component analysis / Near Optimal Sketching of Low-Rank Tensor Regression

No comments:

Post a Comment