Nuit Blanche: Dictionary Learning for Massive Matrix Factorization

Monday, May 09, 2016

Dictionary Learning for Massive Matrix Factorization - implementation -

Here is an interesting way of speeding up dictionary learning:

To achieve our goal, we propose to use an objective akin to (11), where the masks are now random variables independant from the samples. In other words, we want to combine ideas of online dictionary learning with random subsampling, in a principled manner. This leads us to consider an infinite stream of samples (Mtxt)t0, where the signals xt are i.i.d. samples from the data distribution – that is, a column of X selected at random – and Mt “selects” a random subset of observed entries in X. This setting can accommodate missing entries, never selected by the mask, and only requires loading a subset of xt at each iteration

Dictionary Learning for Massive Matrix Factorization by Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising. Its applicability to large datasets has been addressed with online and randomized methods, that reduce the complexity in one of the matrix dimension, but not in both of them. In this paper, we tackle very large matrices in both dimensions. We propose a new factoriza-tion method that scales gracefully to terabyte-scale datasets, that could not be processed by previous algorithms in a reasonable amount of time. We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (fMRI) data, and on matrix completion problems for recommender systems, where we obtain significant speed-ups compared to state-of-the art coordinate descent methods.

MODL or Masked Online Dictionary Learning is available on Github.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !