## Wednesday, November 16, 2011

### Matrix Factorization This Week.

The first part of this entry is dedicated to implementations that will be are now listed in the Matrix Factorization Jungle page. Yesterday, we saw that Tianyi Zhou released an advanced/faster version of the GoDec algorithm (Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case) called Semi-Soft GoDec code. The code is here.
Nonnegative matrix factorization (NMF) with the ItakuraSaito divergence has proven efﬁcient for audio source separation and music transcription, where the signal power spectrogram is factored into a “dictionary” matrix times an “activation” matrix. Given the nature of audio signals it is expected that the activation coefﬁcients exhibit smoothness along time frames. This may be enforced by penalizing the NMF objective function with an extra term reﬂecting smoothness of
the activation coefﬁcients. We propose a novel regularization term that solves some deﬁciencies of our previous work and leads to an efﬁcient implementation using a majorizationminimization procedure.
Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate. This information may typically be retrieved from manual annotation. We use a so-called multichannel nonnegative tensor factorization (NTF) model, in which the original sources are observed through a multichannel convolutive mixture and in which the source power spectrograms are jointly modeled by a 3-valence (time/frequency/source) tensor.Our user-guided separation method produced competitive results at the 2010 Signal Separation Evaluation Campaign, with sufﬁcient quality for real-world music editing applications

Instrument extraction from a piece with La Callas.

Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the data under test, and also for image recovery. In the context of compressive sensing, signiﬁcant improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal image expansions. The compressive-measurement projections are also optimized for the learned dictionary. Additionally, we consider simpler (incomplete) measurements, deﬁned by measuring a subset of image pixels, selected uniformly at random. Spatial inter-relationships within imagery are exploited through use of the Dirichlet and probit stick-breaking processes. Several example results are presented, with comparisons to other methods in the literature.

The BPFA algorithm can be found here. It was added to the Compressive Sensing Big Picture Page in the Dictionary Learning section as well.

In R, I found a library for NMF
This package provides a framework to perform Non-negative Matrix Factorization (NMF). It implements a set of already published algorithms and seeding methods, and provides a framework to test, develop and plug new/custom algorithms. Most of the built-in algorithms have been optimized in C++, and the main interface function provides an easy way of performing parallel computations on multicore machines.

Finally, on the LinkedIn Matrix Factorization groupAntoine Liutkus let us know that he released a BSD implementation of NMF in Matlab: Practical NMF/NTF with beta divergence

Let us now get to the papers that came up on my radar screen this past week:

Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization by Gonzalo MateosGeorgios B. Giannakis. The abstract reads:
Principal component analysis (PCA) is widely used for dimensionality reduction, with well-documented merits in various applications involving high-dimensional data, including computer vision, preference measurement, and bioinformatics. In this context, the fresh look advocated here permeates benefits from variable selection and compressive sampling, to robustify PCA against outliers. A least-trimmed squares estimator of a low-rank bilinear factor analysis model is shown closely related to that obtained from an $\ell_0$-(pseudo)norm-regularized criterion encouraging sparsity in a matrix explicitly modeling the outliers. This connection suggests robust PCA schemes based on convex relaxation, which lead naturally to a family of robust estimators encompassing Huber's optimal M-class as a special case. Outliers are identified by tuning a regularization parameter, which amounts to controlling sparsity of the outlier matrix along the whole robustification path of (group) least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its neat ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace robustly, as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.

Suppose you are given a matrix X 2 R n1 n2 with rank r min(n1; n2). Moreover, assume this matrix has sparse nonzero elements so that, due to the column-wise dependencies, they are all supported on k n1 number of rows (it can also be column-wise supported). This matrix wont have many degrees of freedom; If one knows the position of those k nonzero rows, the corresponding submatrix contains only (k + n2  r)r degrees of freedom. Provided by the enormous developments in areas of compressed sensing and low rank-matrix recovery [1][2][3][4], one may wonder if it is possible to acquire the whole matrix elements from a very few number of non-adaptive linear measurements. In this regard, three questions immediately follow; what should be those measurements? How to design a computationally tractable algorithm to recover this matrix from those possibly noisy measurements? And ﬁnally, how to evaluate the performance i.e., how many measurements do we need to recover exact low-rank and sparse matrix, and does the algorithm performs stable with respect to matrices that are approximately lowrank or not exactly joint-sparse but compressible? This paper attempts to answer the questions above.

We consider analysis of noisy and incomplete hyperspectral imagery, with the objective of removing the noise and inferring the missing data. The noise statistics may be wavelength-dependent, and the fraction of data missing (at random) may be substantial, including potentially entire bands, offering the potential to signiﬁcantly reduce the quantity of data that need be measured. To achieve this objective, the imagery is divided into contiguous three-dimensional (3D) spatio-spectral blocks, of spatial dimension much less than the image dimension. It is assumed that each such 3D block may be represented as a linear combination of dictionary elements of the same dimension, plus noise, and the dictionary elements are learned in situ based on the observed data (no a priori training). The number of dictionary elements needed for representation of any particular block is typically small relative to the block dimensions, and all the image blocks are processed jointly (“collaboratively”) to infer the underlying dictionary. We address dictionary learning from a Bayesian perspective, considering two distinct means of imposing sparse dictionary usage. These models allow inference of the number of dictionary elements needed as well as the underlying wavelength-dependent noise statistics. It is demonstrated that drawing the dictionary elements from a Gaussian process prior, imposing structure on the wavelength dependence of the dictionary elements, yields signiﬁcant advantages, relative to the more-conventional approach of using an i.i.d. Gaussian prior for the dictionary elements; this advantage is particularly evident in the presence of noise. The framework is demonstrated by processing hyperspectral imagery with a signiﬁcant number of voxels missing uniformly at random, with imagery at speciﬁc wavelengths missing entirely, and in the presence of substantial additive noise

We propose a novel approach to reconstruct Hyperspectral images from very few number of noisy compressive measurements. Our reconstruction approach is based on a convex minimization which penalizes both the nuclear norm and the `2;1 mixed-norm of the data matrix. Thus, the solution tends to have a simultaneously low-rank and joint-sparse structure. We explain how these two assumptions ﬁt Hyperspectral data, and by severals simulations we show that our proposed reconstruction scheme signiﬁcantly enhances the state-ofthe-art tradeoffs between the reconstruction error and the required number of CS measurements.

In the Nonnegative Matrix Factorization (NMF) problem we are given an $n \times m$ nonnegative matrix $M$ and an integer $r > 0$. Our goal is to express $M$ as $A W$ where $A$ and $W$ are nonnegative matrices of size $n \times r$ and $r \times m$ respectively. In some applications, it makes sense to ask instead for the product $AW$ to approximate $M$ -- i.e. (approximately) minimize $\norm{M - AW}_F$ where $\norm{}_F$ denotes the Frobenius norm; we refer to this as Approximate NMF. This problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormously popular in machine learning, where $A$ and $W$ are computed using a variety of local search heuristics. Vavasis proved that this problem is NP-complete. We initiate a study of when this problem is solvable in polynomial time:
1. We give a polynomial-time algorithm for exact and approximate NMF for every constant $r$. Indeed NMF is most interesting in applications precisely when $r$ is small.
2. We complement this with a hardness result, that if exact NMF can be solved in time $(nm)^{o(r)}$, 3-SAT has a sub-exponential time algorithm. This rules out substantial improvements to the above algorithm.
3. We give an algorithm that runs in time polynomial in $n$, $m$ and $r$ under the separablity condition identified by Donoho and Stodden in 2003.
The algorithm may be practical since it is simple and noise tolerant (under benign assumptions). Separability is believed to hold in many practical settings. To the best of our knowledge, this last result is the first example of a polynomial-time algorithm that provably works under a non-trivial condition on the input and we believe that this will be an interesting and important direction for future work.
ANALYSIS OF CLIQUE BY MATRIX FACTORIZATION AND PARTITION METHODS by  Raghunath Kar, Dr. Susant Kumar Das. The abstract reads:
In real life clustering of high dimensional data is a big problem. To find out the dense regions from increasing dimensions is one of them. We have already studied the clustering techniques of low dimensional data sets like k-means, k-mediod, BIRCH, CLARANS, CURE, DBScan, PAM etc. If a region is dense then it consists with number of data points with a minimum support of input parameter ø other wise it cannot take into clustering. So in this approach we have implemented CLIQUE to find out the clusters from multidimensional data sets. In dimension growth subspace clustering the clustering process start at single dimensional subspaces and grows upward to higher dimensional ones. It is a partition method where each dimension divided like a grid structure. In this paper the elimination of redundant objects from the regions by matrix factorization and partition method are implemented. The comparisons between CLIQUES with these two methods are studied. The redundant data point belongs to which  region to form a cluster is also studied.
Structural Similarity and Distance in Learning by Joseph WangVenkatesh SaligramaDavid A. Castañón. The abstract reads:
We propose a novel method of introducing structure into existing machine learning techniques by developing structure-based similarity and distance measures. To learn structural information, low-dimensional structure of the data is captured by solving a non-linear, low-rank representation problem. We show that this low-rank representation can be kernelized, has a closed-form solution, allows for separation of independent manifolds, and is robust to noise. From this representation, similarity between observations based on non-linear structure is computed and can be incorporated into existing feature transformations, dimensionality reduction techniques, and machine learning methods. Experimental results on both synthetic and real data sets show performance improvements for clustering, and anomaly detection through the use of structural similarity.
Beating Randomized Response on Incoherent Matrices by Moritz HardtAaron Roth. The abstract reads:
Computing accurate low rank approximations of large matrices is a fundamental data mining task. In many applications however the matrix contains sensitive information about individuals. In such case we would like to release a low rank approximation that satisfies a strong privacy guarantee such as differential privacy. Unfortunately, to date the best known algorithm for this task that satisfies differential privacy is based on naive input perturbation or randomized response: Each entry of the matrix is perturbed independently by a sufficiently large random noise variable, a low rank approximation is then computed on the resulting matrix.
We give (the first) significant improvements in accuracy over randomized response under the natural and necessary assumption that the matrix has low coherence. Our algorithm is also very efficient and finds a constant rank approximation of an m x n matrix in time O(mn). Note that even generating the noise matrix required for randomized response already requires time O(mn).

Continuous-variable quantum compressed sensing by Matthias OhligerVincent NesmeDavid GrossJens Eisert. The abstract reads:
We introduce a novel method to faithfully reconstruct unknown quantum states that are close to low-rank states by few measurement settings only. The method is general enough to allow for measurements from a continuous family, and is also applicable to continuous-variable states. As a technical result, this work generalizes quantum compressed sensing to the situation where the measured observables are not necessarily pairwise orthogonal, but can be taken from a so-called tight frame - covering hence most meaningful measuring scenarios. As continuous-variable states and measurements are most ubiquitous in the optical setting, we discuss the reconstruction of states of multi-mode light in great detail and show the advantage of the proposed technique over present methods for quantum-optical state reconstruction, as well as its robustness under noise. We finally introduce a method to construct a certificate which guarantees the success of the reconstruction with an efficient algorithm with no assumption on the state.

Spectral unmixing is an important tool in hyperspectral data analysis for estimating endmembers and abundance fractions in a mixed pixel. This paper examines the applicability of a recently developed algorithm called graph regularized nonnegative matrix factorization (GNMF) for this aim. The proposed approach exploits the intrinsic geometrical structure of the data besides considering positivity and full additivity constraints. Simulated data based on the measured spectral signatures, is used for evaluating the proposed algorithm. Results in terms of abundance angle distance (AAD) and spectral angle distance (SAD) show that this method can effectively unmix hyperspectral data.

In this paper, we propose to perform clustering and temporal prediction on network-level traffic states of large-scale traffic networks. Rather than analyzing dynamics of traffic states on individual links, we study overall spatial configurations of traffic states in the whole network and temporal dynamics of global traffic states. With our analysis, we can not only find out typical spatial patterns of global traffic states in daily traffic scenes, but also acquire long-term general predictions of the spatial patterns, which could be used as prior knowledge for modeling temporal behaviors of traffic flows. For this purpose, we use a locality preservation constraints based non-negative matrix factorization (LPNMF) to obtain a low-dimensional representation of network-level traffic states. Clustering and temporal prediction are then performed on the proposed compact representation. Experiments on realistic simulated traffic data are provided to check and illustrate the validity of our proposed approach.

Divide-and-Conquer Matrix Factorization by Lester MackeyAmeet TalwalkarMichael I. Jordan. The abstract reads:
This work introduces Divide-Factor-Combine (DFC), a parallel divide-andconquer framework for noisy matrix factorization. DFC divides a large-scale matrix factorization task into smaller subproblems, solves each subproblem in parallel using an arbitrary base matrix factorization algorithm, and combines the subproblem solutions using techniques from randomized matrix approximation. Our experiments with collaborative ﬁltering, video background modeling, and simulated data demonstrate the near-linear to super-linear speed-ups attainable with this approach. Moreover, our analysis shows that DFC enjoys high-probability recovery guarantees comparable to those of its base algorithm.

This paper introduces a general framework of covariance structures that can be verified in many popular statistical models, such as factor and random effect models. The new structure is a summation of low rank and sparse matrices. We propose a LOw Rank and sparsE Covariance estimator (LOREC) to exploit this general structure in the high-dimensional setting. Analysis of this estimator shows that it recovers exactly the rank and support of the two components respectively. Convergence rates under various norms are also presented. The estimator is computed efficiently using convex optimization. We propose an iterative algorithm, based on Nesterov's method, to solve the optimization criterion. The algorithm is shown to produce a solution within O(1/t^2) of the optimal, after any finite t iterations. Numerical performance is illustrated using simulated data and stock portfolio selection on S&P 100.

Quadratic nonnegative matrix factorization by Zhirong Yang Erkki Oja. The abstract reads:
In Nonnegative Matrix Factorization (NMF), a nonnegative matrix is approximated by a product of lower-rank factorizing matrices. Most NMF methods assume that each factorizing matrix appears only once in the approximation, thus the approximation is linear in the factorizing matrices. We present a new class of approximative NMF methods, called Quadratic Nonnegative Matrix Factorization (QNMF), where some factorizing matrices occur twice in the approximation. We demonstrate QNMF solutions to four potential pattern recognition problems in graph partitioning, two-way clustering, estimating hidden Markov chains, and graph matching. We derive multiplicative algorithms that monotonically decrease the approximation error under a variety of measures. We also present extensions in which one of the factorizing matrices is constrained to be orthogonal or stochastic. Empirical studies show that for certain application scenarios, QNMF is more advantageous than other existing nonnegative matrix factorization methods.

Structural Identifiability in Low-Rank Matrix Factorization by Epameinondas Fritzilas, Martin Milani c, Sven RahmannYasmin A. Rios-Solis. The abstract reads:

In many signal processing and data mining applications, we need to approximate a given matrix Y with a low-rank product Y AX. Both matrices A and X are to be determined, but we assume that from the speci cs of the application we have an important piece of a-priori knowledge:A must have zeros at certain positions. In general, di erent AX factorizations approximate a given Y equally well, so a fundamental question is whether the known zero pattern of A contributes to the uniqueness of the factorization. Using the notion of structural.rank, we present a combinatorial characterization of uniqueness up to diagonal scaling (subject to a mild non-degeneracy condition on the factors), called structural identi ability of the model. Next, we de ne an optimization problem that arises in the need for e fficient experimental design. In this context, Y contains sensor measurements over several time samples, X contains source signals over time samples and A contains the source-sensor mixing coe cients. Our task is to monitor the signal sources with the cheapest subset of sensors, while maintaining structural identi ability. First, we show that this problem is NP-hard. Secondly, we present a mixed integer linear program for its exact solution together with two practical incremental approaches. We also propose a greedy approximation algorithm. Finally, we perform computational experiments on simulated problem instances of various sizes

Finally, at 2012 IEEE WCCI 2012 there will be a Special Session on Nonnegative Matrix factorization paradigm for unsupervised learning. Let me note that in the topics of interest, none of them list low-rank
Topics of interest include but not limited to:    - Convex-NMF       - Hard clustering and NMF       - Kernel-NMF       - NMF for Large-Scale Data       - Maximum margin matrix factorization (MMMF)       - NMF with Sparseness Constraints       - Orthogonal symmetric NMF       - Probabilistic NMF       - Relaxed NMF       - Semi-NMF       - Tri-NMF       - Weighted NMF       - Weighted NMTri-Factorization
It is probably too new compared to the too many declination of the NMF algorithm. Right now, since there is not much theoretical work with NMF, compared to low rank, and we are bound to see this situation continue on flourishing. I am not sure it is a good thing mainly because nobody has a real visibility on what works and what doesn't.

For kicks, in a blog entry Fabian Pedregosa presented a Low rank approximation of Lena