Nuit Blanche: Op-ed: "Can Your package Handle It ?" / Matrix Factorization This Week

Thursday, March 08, 2012

Op-ed: "Can Your package Handle It ?" / Matrix Factorization This Week

We mostly have Matrix Factorization papers this week (they are listed at the end of this op-ed). Some of them are directly relevant to compressive sensing (dictionary learning) but what got my attention this week is this exchange between Kerui Min, now a student at Illinois and Olivier Grisel (one of the developers of Python Based ML outstanding Scikit-Learn package). While the initial question spanned also recommender systems, it looks like it is not common knowledge that, with the help of random projections, we can now look into "large" matrix decompositions. I just had to take an example from GoDec but I could have also mentioned many others including the recent SpaRCS toolbox when it come to Robust PCA, The issue is generally more about how large the rank is than the total size of the matrix. Most large rank problems can always be simply decomposed in smaller problems when it comes to robust denoising videos for instance. The lesson of all this ? if a student wants to code something for your package using Google's support (GSoC), robust denoising ought to be a viable candidate. But let's not stop here: Here is another observation triggered from watching similar projects last year. It is important that students realize that there is not one implementation of NMF, Robust PCA,..... For instance, in the literature, several implementations of NMF yield different results because the underlying algorithm is quite simply not the same yet they all enjoy the same name. If, as a package manager, you include any versions of the algorithm listed in the Advanced Matrix Factorization Jungle page, you ought to be asking specifically what algorithm is exactly implemented and not rely on just "it's Robust-PCA, Sparse-PCA, etc....". As a matter of generic concern, with robust-PCA or any of the advanced matrix decomposition arising in the past two years, we are beginning to see an enhanced capability for dealing with videos. We ain't in Kansas no more and an important question to the open source scientific toolboxes maintainers becomes "Can your package handle this format seamlessly ? ...Can your package really handle it "

Anybody interested can also join the Advanced Matrix Factorization group on LinkedIn. As it currently stands out, the group gathers people in academia and a wide cross section of companies as Twitter, Google, LinkedIn, Intel, Groupon, Yahoo!, Amazon, Microsoft, DreamWorks, AT&T, P&G, Shell, Shopzilla, Nokia, HRL Labs, CGCVeritas, Toyota, Nuxeo, some hedge funds and a few more.

Here are some of the Matrix Factorization papers this week:

Noisy low-rank matrix completion with general sampling distribution by Olga Klopp . The abstract reads:

In the present paper we consider the problem of matrix completion with noise for general sampling schemes. Unlike previous works, in our construction we do not need to know or to evaluate the sampling distribution or the variance of the noise. We propose new nuclear-norm penalized estimators, one of them of the "square-root" type. We prove that, up to a logarithmic factor, our estimators achieve optimal rates with respect to the estimation error.

Dictionary learning under global sparsity constraint by Deyu Meng, Yee Leung, Qian Zhao, Zongben Xu . The abstract reads:

A new method is proposed in this paper to learn overcomplete dictionary from training data samples. Differing from the current methods that enforce similar sparsity constraint on each of the input samples, the proposed method attempts to impose global sparsity constraint on the entire data set. This enables the proposed method to fittingly assign the atoms of the dictionary to represent various samples and optimally adapt to the complicated structures underlying the entire data set. By virtue of the sparse coding and sparse PCA techniques, a simple algorithm is designed for the implementation of the method. The efficiency and the convergence of the proposed algorithm are also theoretically analyzed. Based on the experimental results implemented on a series of signal and image data sets, it is apparent that our method performs better than the current dictionary learning methods in original dictionary recovering, input data reconstructing, and salient data structure revealing.

Principal Component Pursuit with Reduced Linear Measurements by Arvind Ganesh, Kerui Min, John Wright, Yi Ma . The abstract reads:

In this paper, we study the problem of decomposing a superposition of a low-rank matrix and a sparse matrix when a relatively few linear measurements are available. This problem arises in many data processing tasks such as aligning multiple images or rectifying regular texture, where the goal is to recover a low-rank matrix with a large fraction of corrupted entries in the presence of nonlinear domain transformation. We consider a natural convex heuristic to this problem which is a variant to the recently proposed Principal Component Pursuit. We prove that under suitable conditions, this convex program guarantees to recover the correct low-rank and sparse components despite reduced measurements. Our analysis covers both random and deterministic measurement models.

Divide-and-Conquer Method for L1 Norm Matrix Factorization in the Presence of Outliers and Missing Data by Deyu Meng . The abstract reads:

The low-rank matrix factorization as a L1 norm minimization problem has recently attracted much attention due to its intrinsic robustness to the presence of outliers and missing data. In this paper, we propose a new method, called the divide-and-conquer method, for solving this problem. The main idea is to break the original problem into a series of smallest possible sub-problems, each involving only unique scalar parameter. Each of these subproblems is proved to be convex and has closed-form solution. By recursively optimizing these small problems in an analytical way, efficient algorithm, entirely avoiding the time-consuming numerical optimization as an inner loop, for solving the original problem can naturally be constructed. The computational complexity of the proposed algorithm is approximately linear in both data size and dimensionality, making it possible to handle large-scale L1 norm matrix factorization problems. The algorithm is also theoretically proved to be convergent. Based on a series of experiment results, it is substantiated that our method always achieves better results than the current state-of-the-art methods on $L1$ matrix factorization calculation in both computational time and accuracy, especially on large-scale applications such as face recognition and structure from motion.

Online Discriminative Dictionary Learning for Image Classification Based on Block-Coordinate Descent Method by Shu Kong, Donghui Wang . The abstract reads:

Previous researches have demonstrated that the framework of dictionary learning with sparse coding, in which signals are decomposed as linear combinations of a few atoms of a learned dictionary, is well adept to reconstruction issues. This framework has also been used for discrimination tasks such as image classification. To achieve better performances of classification, experts develop several methods to learn a discriminative dictionary in a supervised manner. However, another issue is that when the data become extremely large in scale, these methods will be no longer effective as they are all batch-oriented approaches. For this reason, we propose a novel online algorithm for discriminative dictionary learning, dubbed \textbf{ODDL} in this paper. First, we introduce a linear classifier into the conventional dictionary learning formulation and derive a discriminative dictionary learning problem. Then, we exploit an online algorithm to solve the derived problem. Unlike the most existing approaches which update dictionary and classifier alternately via iteratively solving sub-problems, our approach directly explores them jointly. Meanwhile, it can largely shorten the runtime for training and is also particularly suitable for large-scale classification issues. To evaluate the performance of the proposed ODDL approach in image recognition, we conduct some experiments on three well-known benchmarks, and the experimental results demonstrate ODDL is fairly promising for image classification tasks.

Fast approximations to structured sparse coding and applications to object classification by Arthur Szlam, Karol Gregor, Yann LeCun. The abstract reads:

We describe a method for fast approximation of sparse coding. The input space is subdivided by a binary decision tree, and we simultaneously learn a dictionary and assignment of allowed dictionary elements for each leaf of the tree. We store a lookup table with the assignments and the pseudoinverses for each node, allowing for very fast inference. We give an algorithm for learning the tree, the dictionary and the dictionary element assignment, and In the process of describing this algorithm, we discuss the more general problem of learning the groups in group structured sparse modelling. We show that our method creates good sparse representations by using it in the object recognition framework of \cite{lazebnik06,yang-cvpr-09}. Implementing our own fast version of the SIFT descriptor the whole system runs at 20 frames per second on $321 \times 481$ sized images on a laptop with a quad-core cpu, while sacrificing very little accuracy on the Caltech 101 and 15 scenes benchmarks.