Nuit Blanche: Randomized Matrix Decompositions using R

Monday, August 15, 2016

Randomized Matrix Decompositions using R - implementation -

Yes ! The R people are coming on our side of large scale linear algebra and RandNLA (Randomized Numerical Linear Algebra). Try those randomized SVDs once, do your homework and in a few tests, you will never look back: Random projections are the future and they should probably applied to most Advanced Matrix Factorization jungle.

Here the tidbit that is still anoying and that echoes previous Nuit Blanche blog entries:

From the paper:

"An essential computational step of the randomized algorithm is to construct a measurement matrix that is used to sample the range of the input matrix. Therefore we rely on the properties of random vectors, which imply that the randomly generated columns of the measurement matrix are linearly independent with high probability. Indeed, this includes the whole class of sub-Gaussian random variables (Rivasplata 2012), including matrices with Bernoulli or uniform random entries. Speci cally, sub-Gaussian random variables meet the Johnson-Lindenstrauss properties (Johnson and Lindenstrauss 1984), guaranteeing that the spectral information is preserved.

Due to its beautiful theoretical properties, the random Gaussian measurement matrix is the most prominent choice, consisting of independent identically distributed (iid) N(0;1) standard normal entries. In practice, however, uniform random measurements are su cient and are less expensive to generate. The drawback is that dense matrix multiplications are expensive for large matrices, with computational cost of O(mnk). However, BLAS operations tend to be highly scalable and computations could be substantially accelerated with parallel or distributed processing, e.g., a graphics processing unit (GPU) implementation (Erichson and Donovan 2016; Voronin and Martinsson 2015).

Woolfe et al. (2008) proposed another computationally effi cient approach, exploiting the properties of a structured random measurement matrix. For instance, using a subsampled random Fourier transform measurement matrix reduces the computational costs from O(mnk) to O(mn log(k)).

In some situations, an even simpler construction of the sample matrix Y is su fficient. For instance, Y can be constructed by choosing k random columns without replacement from A, which makes the expensive matrix multiplication redundant. This approach is particularly effi cient if the information is uniformly distributed over the data matrix (Cohen, Lee, Musco,Musco, Peng, and Sidford 2015), e.g., in images or videos. However, if the data matrix is highly sparse, this approach is prone to fail.."

Here is the paper: Randomized Matrix Decompositions using R by N. Benjamin Erichson, Sergey Voronin, Steven L. Brunton, J. Nathan Kutz

The singular value decomposition (SVD) is among the most ubiquitous matrix factorizations and is a cornerstone algorithm for data analysis, dimensionality reduction and data compression. However, despite modern computer power, massive data-sets pose a computational challenge for traditional SVD algorithms. We present the R package rsvd, which enables the fast computation of the SVD and related methods, facilitated by randomized algorithms. Specifically, the package provides routines for computing the randomized singular value decomposition, randomized principal component analysis and randomized robust principal component analysis. Randomized algorithms provide an efficient computational framework for reducing the computational demands of traditional (deterministic) matrix factorizations. The key idea is to compute a compressed representation of the data using random sampling. This smaller matrix captures the essential information that can then be used to obtain a low-rank matrix approximation. Several numerical examples support this powerful concept and show substantial accelerated computational times.

The GitHub repository is here.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !