Nuit Blanche: Robust Near-Separable Nonnegative Matrix Factorization Using Linear Optimization

Wednesday, February 27, 2013

Robust Near-Separable Nonnegative Matrix Factorization Using Linear Optimization - implementation -

Nicolas Gillis just sent me the following:

Cher Igor,

I wanted to let you know of our new LP model for near-separable NMF, see http://arxiv.org/abs/1302.4385.

The near-separable NMF problem can be stated as follows. We are given a nonnegative data matrix M = WH+N where W,H >= 0 is an NMF and N is some noise, and we assume that each column of the matrix W is equal to some column of the matrix M. The goal is to identifying the columns of W among the columns of M (up to the noise level). It is possible to solve this problem in polynomial time given that the noise level is sufficiently small; see Section 5 of Arora, Ge, Kannan, and Moitra, "Computing a nonnegative matrix factorization -- provably", STOC '12, pp.145--162, 2012. Note that the assumption on M is rather strong but is reasonnable for some applications, e.g., hyperspectral unmixing and document classification.

Our new LP model is an extension of Hottopixx by Bittorf, Recht, Ré and Tropp (NIPS 2011), cf. your post http://nuit-blanche.blogspot.com/2012/12/hott-topixx-factoring-nonnegative.html. This new LP model does not require normalization of the input matrix and detects the factorization rank automatically. Moreover, it is more flexible, significantly more tolerant to noise, and can easily be adapted to handle outliers and other noise models. A CPLEX code (along with the code of other near-separable NMF algorithms, including a CPLEX implementation of Hottopixx) is available on https://sites.google.com/site/nicolasgillis/code.

Best regards, Nicolas.

PS. Some updates for the "The Matrix Factorization Jungle":

- The final version of "Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing" is freely available from the journal website: http://jmlr.csail.mit.edu/papers/v13/gillis12a.html

- There is a new version of "Robustness Analysis of HottTopixx, a Linear Programming Model for Factoring Nonnegative Matrices" at http://arxiv.org/abs/1211.6687 (Version 3) --There was an error in Lemma 2 but this only changes some constants here and there.

Thanks Nicolas !, the Advanced Matrix Factorization Jungle Page has been updated.

NMF is so central in much of what we do in engineering and science, it is nice to see some reasons as to why it works so well. Remember it took us twelve year before getting comfortable mathematically speaking. This reminds me of the typical island of knowledge situation where an ecosystem thrives yet has very little connection to the outside body of knowledge. Work like this one are dredging the water between this island and the continental plate The interesting thing here is that these separability and near-separability condition may have a direct impact on subjects like calibration issues and more.

Here is the paper and the slides: Robust Near-Separable Nonnegative Matrix Factorization Using Linear Optimization by Nicolas Gillis, Robert Luce. The abstract reads:

Nonnegative matrix factorization (NMF) has been shown recently to be tractable under the separability assumption, under which all the columns of the input data matrix belong to the convex cone generated by only a few of these columns. Bittorf, Recht, R\'e and Tropp (`Factoring nonnegative matrices with linear programs', NIPS 2012) proposed a linear programming (LP) model, referred to as Hottopixx, which is robust under any small perturbation of the input matrix. However, Hottopixx has two important drawbacks: (i) the input matrix has to be normalized, and (ii) the factorization rank has to be known in advance. In this paper, we generalize Hottopixx in order to resolve these two drawbacks, that is, we propose a new LP model which does not require normalization and detects the factorization rank automatically. Moreover, the new LP model is more flexible, significantly more tolerant to noise, and can easily be adapted to handle outliers and other noise models. Finally, we show on several synthetic datasets that it outperforms Hottopixx while competing favorably with two state-of-the-art methods.

Join the CompressiveSensing subreddit or the Google+ Community and post there !