Nuit Blanche: Insight on very large SVD computations

Tuesday, October 25, 2011

Insight on very large SVD computations

First things first, tomorrow, starting at this time of the day, Muthu and I will be having a video Hangout on Google+. you are welcome to join. For that you may want to follow the instructions Muthu give on his blog.

A discussion with Danny Bickson and Olivier Grisel on the Matrix Factorization Group on LinkedIn led to this insightful discussion with Joel Tropp and Piotr Indyk. Here is the email I sent out to them:

Dear Piotr and Joel,

Recently Danny Bickson at CMU has implemented a full SVD on mutlicore using the GraphLab framework . During a discussion on the Matrix Factorization group on LinkedIn , we wondered if the randomized version could be implemented also on GraphLab and show additional speed ups. The answer from Danny, when reading the Matlab code was the following:

"...Thanks for the link, it is highly relevant and interesting. In fact, I believe it will be pretty straightforward to implement this algorithm in Graphlab, since it is based on SVD as a subroutine. There is only one question I am not sure about and maybe someone from the
group can help. I see in the matlab code that:

%
% Apply A to a random matrix, obtaining H.
%
rand('seed',rand('seed'));

if(isreal(A))
H = A*(2*rand(n,l)-ones(n,l));
end

Namely, a random Gaussian matrix is generated and its mean (1) is subtracted. I wonder if there is some trick to make this matrix sparse? By having a fully dense matrix we loose some of the power of graphlab to scale to very large matrices. ..."

The issue at hand seems to be the use of full gaussian matrices, do you know of any results that would allow us to use sparser matrices (RIP-1, Achlioptas' matrices...). Any additional insight would be welcome. Thank you in advance for your time,

Cheers,

Igor.

Joel responded first with:

Igor,

If you have a serious interest in implementing large-scale randomized SVDs, the survey paper "Finding Structure with Randomness" discusses in detail the numerical analysis issues related to this problem. It would be best to refer people to this work; the paper is comprehensive.

There are only a few types of transformations that have been studied in detail, such as random Gaussians and randomized Fourier transforms. Some other transforms work well empirically, but there is very little analysis beyond these two examples. See Edo Liberty's dissertation for additional results.

A serious computational issue is that most of the transformations available lead to dense intermediate matrices. I am not aware of any transformation that is guaranteed to preserve sparsity and to work with general input matrices. When the singular vectors of the input matrix are incoherent, it is possible to construct approximations using randomized row and column sampling. These algorithms tend to work in machine learning applications, but they require coherence assumptions.

A couple other caveats:

Randomized SVD methods only make sense for computing partial SVDs, not full SVDs. Most people want partial SVDs, so this is not necessarily an issue.

For large sparse matrices, Lanczos methods can be superior to randomized algorithms, but there are serious implementation difficulties that cannot be overlooked. Indeed, unless the Lanczos method has been implemented by an expert, it is very doubtful that the algorithm actually produces accurate spectral information.

regards,
Joel

Piotr then added:

Hi Igor,

Good question. To add a few more points to Joel's response: as far as I understand, a sufficient condition is that the random matrix satisfies Johnson-Lindenstrauss lemma. You can construct such matrices (which also support fast matrix-vector multiplication) by "randomizing" Hadamard matrices, see Joel's survey for more. Of course, such matrices are dense.

There is a recent construction of random *sparse* matrices that satisfy Johnson-Lindenstrauss lemma. See this paper:

Sparser Johnson-Lindenstrauss Transforms
Daniel M. Kane, Jelani Nelson
Proceedings of the 23rd Annual ACM-SIAM Symposium on Discrete Algorithms (SODA
2012),
http://arxiv.org/abs/1012.1577

To get provable guarantees, the matrices cannot be *very* sparse. But you can of course run it beyond what is guaranteed. The matrix construction itself is quite simple.

Piotr