Friday, May 18, 2012

Advanced Matrix Factorization This Week

Nelson Ray, a member of the Advanced Matrix Factorization group and an engineer at MetaMarkets just wrote a new segment on the use of Robust PCA in Boy Bands & Football Fans: Algorithmic Discovery for Twitter

Here is an interesting presentation on work related to time dependent graphs and predictions: Prediction in Dynamic Graph Sequences by Emile Richard

Also from one the same author, it was one of the paper accepted at ICML and here it is: Estimation of Simultaneously Sparse and Low Rank Matrices by Emile RichardPierre-Andre SavalleNicolas Vayatis. The abstract reads:

The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and low-rank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are block-diagonal in the appropriate basis. We introduce a convex mixed penalty which involves !1-norm and trace norm simultaneously. We obtain an oracle inequality which indicates how the two effects interact according to the nature of the target matrix. We bound generalization error in the link prediction problem. We also develop proximal descent strategies to solve the optimization problem efficiently and evaluate performance on synthetic and real data sets.

I am sure you have all spent your last week-end on watching the Berkeley meeting on streaming algorithms ( All the videos are here: Part 1, Part 2, Part 3, Part 4, Part 5).. In it there was Petros Drineas who talk about Randomized Algorithms in Data Mining: a Linear Algebraic Approach where he shows the comparison between a careful approach to reducing a very large matrix for SVD purposes and the not-so careful but as efficient Johnson-Lindenstrauss/PHD randomized approach. 

One might wonder if the newer JL transforms might even provide faster algorithms, for reference:

In a different direction, we also have this week:

We analyze the parallel performance of randomized interpolative decomposition by decomposing low rank complex-valued Gaussian random matrices larger than 100 GB. We chose a Cray XMT supercomputer as it provides an almost ideal PRAM model permitting quick investigation of parallel algorithms without obfuscation from hardware idiosyncrasies. We obtain that on non-square matrices performance becomes very good, with overall runtime over 70 times faster on 128 processors. We also verify that numerically discovered error bounds still hold on matrices two orders of magnitude larger than those previously tested.

Item recommendation is the task of predicting a personalized ranking on a set of items (e.g. websites, movies, products). In this paper, we investigate the most common scenario with implicit feedback (e.g. clicks, purchases). There are many methods for item recommendation from implicit feedback like matrix factorization (MF) or adaptive knearest-neighbor (kNN). Even though these methods are designed for the item prediction task of personalized ranking, none of them is directly optimized for ranking. In this paper we present a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem. We also provide a generic learning algorithm for optimizing models with respect to BPR-Opt. The learning method is based on stochastic gradient descent with bootstrap sampling. We show how to apply our method to two state-of-the-art recommender models: matrix factorization and adaptive kNN. Our experiments indicate that for the task of personalized ranking our optimization method outperforms the standard learning techniques for MF and kNN. The results show the importance of optimizing models for the right criterion.

Matrix factorizations in higher codimension by Jesse Burke, Mark E. Walker . The abstract reads: 
We observe that there is an equivalence between the singularity category of an affine complete intersection and the homotopy category of matrix factorizations over a related scheme. This relies in part on a theorem of Orlov. Using this equivalence, we give a geometric construction of the ring of cohomology operators, and a generalization of the theory of support varieties, which we call stable support sets. We settle a question of Avramov about which stable support sets can arise for a given complete intersection ring. We also use the equivalence to construct a projective resolution of a module over a complete intersection ring from a matrix factorization, generalizing the well-known result in the hypersurface case.

Low Rank Estimation of Similarities on Graphs by Vladimir Koltchinskii, Pedro Rangel. The abstract reads:
Let (V, E) be a graph with vertex set V and edge set E. Let (X, X', Y) \in V \times V \times {-1, 1} be a random triple, where X, X' are independent uniformly distributed vertices and Y is a label indicating whether X, X' are "similar" (Y = +1), or not (Y = -1). Our goal is to estimate the regression function S\ast (u, v) = E(Y |X = u, X = v), u, v \in V based on training data consisting of n i.i.d. copies of (X, X',Y). We are interested in this problem in the case when S\ast is a symmetric low rank kernel and, in addition to this, it is assumed that S\ast is "smooth" on the graph. We study estimators based on a modified least squares method with complexity penalization involving both the nuclear norm and Sobolev type norms of symmetric kernels on the graph and prove upper bounds on L2 -type errors of such estimators with explicit dependence both on the rank of S\ast and on the degree of its smoothness.

Collaborative ltering is a rapidly advancing research area. Every year several new techniques are proposed and yet it is not clear which of the techniques work best and under what conditions. In this paper we conduct a study comparing several collaborative ltering techniques { both classic and recent state-of-the-art { in a variety of experimental contexts. Speci cally, we report conclusions controlling for number of items, number of users, sparsity level, performance criteria, and computational complexity. Our conclusions identify what algorithms work well and in what conditions, and contribute to both industrial deployment collaborative filtering algorithms and to the research community.

This paper is a little old (2010) but I like it as it combines the experimental aspect of imaging and some advanced matrix decomposition: Nonnegative matrix factorization: a blind spectra separation method for in vivo fluorescent optical imaging by Anne-Sophie Montcuquet, Lionel Herve, Fabrice Navarro, Jean-Marc Dinten, Jerome I. Mars. The abstract reads:

Fluorescence imaging in diffusive media is an emerging imaging modality for medical applications that uses injected fluorescent markers that bind to specific targets, e.g., carcinoma. The region of interest is illuminated with near-IR light and the emitted back fluorescence is analyzed to localize the fluorescence sources. To investigate a thick medium, as the fluorescence signal decreases with the light travel distance, any disturbing signal, such as biological tissues intrinsic fluorescence (called autofluorescence) is a limiting factor. Several specific markers may also be simultaneously injected to bind to different molecules, and one may want to isolate each specific fluorescent signal from the others. To remove the unwanted fluorescence contributions or separate different specific markers, a spectroscopic approach is explored. The nonnegative matrix factorization (NMF) is the blind positive source separation method we chose. We run an original regularized NMF algorithm we developed on experimental data, and successfully obtain separated in vivo fluorescence spectra.

1 comment:

isomorphismes said...

thanks Igor. Great compilation here.