Nuit Blanche: Compressed Nonnegative Matrix Factorization is Fast and Accurate

Monday, May 25, 2015

Compressed Nonnegative Matrix Factorization is Fast and Accurate - implementation -

Compressed Nonnegative Matrix Factorization is Fast and Accurate by Mariano Tepper, Guillermo Sapiro

Nonnegative matrix factorization (NMF) has an established reputation as a useful data analysis technique in numerous applications. However, its usage in practical situations is undergoing challenges in recent years. The fundamental factor to this is the increasingly growing size of the datasets available and needed in the information sciences. To address this, in this work we propose to use structured random compression, that is, random projections that exploit the data structure, for two NMF variants: classical and separable. In separable NMF (SNMF) the left factors are a subset of the columns of the input matrix. We present suitable formulations for each problem, dealing with different representative algorithms within each one. We show that the resulting compressed techniques are faster than their uncompressed variants, vastly reduce memory demands, and do not encompass any significant deterioration in performance. The proposed structured random projections for SNMF allow to deal with arbitrarily shaped large matrices, beyond the standard limit of tall-and-skinny matrices, granting access to very efficient computations in this general setting. We accompany the algorithmic presentation with theoretical foundations and numerous and diverse examples, showing the suitability of the proposed approaches.

of note:

It is well studied that Gaussian projection preserves the l`2 norm [e.g.,14,and references therein]. However, our extensive experiments show that structured random compression achieves better performance than Gaussian compression. Intuitively, Gaussian compression is a general data-agnostic tool, whereas structured compression uses information from the matrix (an analogous of training). Theoretical research is needed to fully justify thisperformance gap

In particular this is quite obvious that gaussian projections do seem to get the same results

but then again, it may be because not enough gaussian projections were used. Anyway, an implementation is on Mariano Tepper's code page: http://www.marianotepper.com.ar/research/cnmf

Join the CompressiveSensing subreddit or the Google+ Community and post there !