## Friday, October 14, 2011

### Matrix Factorization This Week.

Here is an interesting interaction, how can you give away your robust PCA to an open source project ?

Zeno Ganter reminded me an "older" paper  entitled: Bayesian Probabilistic Matrix Factorization using MCMC by Ruslan Salakhutdinov and Andriy Mnih. The interesting part is the attendant Matlab code,and its implementation by GraphLab. I will list both on the Matrix Factorization Jungle. In the meantime, Zeno also provided me with a list of additional information on matrix factorization:

Hi Igor
.....
2. Matrix Completion for Implicit Feedback data
The "classical" MF methods for collaborative filtering/recommender systems optimize for RMSE, and can be used if there are explicit ratings by the users. In practice this is very often not the case, you only know about the users' past actions (Who bought what?). Then the completion problem becomes a bit more difficult, because it is a positive-only problem: You only have examples from one class and missing values.
There are several papers dealing with such kind of factorization problems, e.g.
(a) optimizing for pointwise errors:
Hu, Koren, Volinsky: Collaborative Filtering for Implicit Feedback Datasets. ICDM 2008
Pan, Zhou, Cao, Liu, Lukose, Scholz, Yang. 2008: One-Class Collaborative Filtering. ICDM 2008
(b) optimizing for ranking: Rendle, Freudenthaler, Gantner, Schmidt-Thieme: BPR: Bayesian
Personalized Ranking from Implicit Feedback
Speaking of ranking, there is also an approach that does matrix completion, but not optimized for RMSE, but NDCG on the known ratings:
Improving maximum margin matrix factorization Machine Learning Journal and European Conference on Machine Learning (ECML/PKDD 2008)
3. MF implementations in Open Source recommender system frameworks
There are implementations of MF for rating prediction and for implicit feedback data (at least there is a patch for it, not sure whether it is already in an official release).
Our own package, MyMediaLite, also contains implementations of MF for rating prediciotn and for implicit feedback data. Additionally, we have a BPR-MF implementation. http://ismll.de/mymedialite
4. Parallel SGD for Matrix Completion
We also have an implementation of block-free SGD for RMSE matrix completion,
which uses the same idea as in Jellyfish and in Gemulla et al.'s KDD paper on distributed MF, which was published a bit earlier than Jellyfish:
R. Gemulla, E. Nijkamp, P. J. Haas, Y. Sismanis,  Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. Both papers (the KDD paper and the Jellyfish one) come up with the same idea of independent updates, and both papers cover that simple and straighforward idea (nothing be ashamed of!) in a quite complex description. Nonetheless, both papers have plenty of additional ideas and interesting details thrown in, for example (Gemulla) using a bold-driver heuristic (from the neural network literature) for learning rate adaptation, and using subsamples to find good learning rates, or (Jellyfish) using some cores for also doing the example shuffling in parallel, thus avoiding bottlenecks.
......

Thanks Zeno. On arxiv, we had the following preprints:

Matrix factorization from a small number of observed entries has recently garnered much attention as the key ingredient of successful recommendation systems. One unresolved problem in this area is how to adapt current methods to handle changing user preferences over time. Recent proposals to address this issue are heuristic in nature and do not fully exploit the time-dependent structure of the problem. As a principled and general temporal formulation, we propose a dynamical state space model of matrix factorization. Our proposal builds upon probabilistic matrix factorization, a Bayesian model with Gaussian priors. We utilize results in state tracking, such as the Kalman filter, to provide accurate recommendations in the presence of both process and measurement noise. We show how system parameters can be learned via expectation-maximization and provide comparisons to current published techniques.

The power of sparse signal modeling with learned over-complete dictionaries has been demonstrated in a variety of applications and fields, from signal processing to statistical inference and machine learning. However, the statistical properties of these models, such as under-fitting or over-fitting given sets of data, are still not well characterized in the literature. As a result, the success of sparse modeling depends on hand-tuning critical parameters for each data and application. This work aims at addressing this by providing a practical and objective characterization of sparse models by means of the Minimum Description Length (MDL) principle -- a well established information-theoretic approach to model selection in statistical inference. The resulting framework derives a family of efficient sparse coding and dictionary learning algorithms which, by virtue of the MDL principle, are completely parameter free. Furthermore, such framework allows to incorporate additional prior information to existing models, such as Markovian dependencies, or to define completely new problem formulations, including in the matrix analysis area, in a natural way. These virtues will be demonstrated with parameter-free algorithms for the classic image denoising and classification problems, and for low-rank matrix recovery in video applications.
Videos used for this paper can be found here.

This one is interesting for the folks interested in solving SDPs: Regularized Laplacian Estimation and Fast Eigenvector Approximation by Patrick O. Perry, Michael W. Mahoney. The abstract reads:
Recently, Mahoney and Orecchia demonstrated that popular diffusion-based procedures to compute a quick \emph{approximation} to the first nontrivial eigenvector of a data graph Laplacian \emph{exactly} solve certain regularized Semi-Definite Programs (SDPs). In this paper, we extend that result by providing a statistical interpretation of their approximation procedure. Our interpretation will be analogous to the manner in which $\ell_2$-regularized or $\ell_1$-regularized $\ell_2$-regression (often called Ridge regression and Lasso regression, respectively) can be interpreted in terms of a Gaussian prior or a Laplace prior, respectively, on the coefficient vector of the regression problem. Our framework will imply that the solutions to the Mahoney-Orecchia regularized SDP can be interpreted as regularized estimates of the pseudoinverse of the graph Laplacian. Conversely, it will imply that the solution to this regularized estimation problem can be computed very quickly by running, e.g., the fast diffusion-based PageRank procedure for computing an approximation to the first nontrivial eigenvector of the graph Laplacian. Empirical results are also provided to illustrate the manner in which approximate eigenvector computation \emph{implicitly} performs statistical regularization, relative to running the corresponding exact algorithm.

and finally, Fault Tolerant Matrix Pencil Method for Direction of Arrival Estimation by T. Yerriswamy, S.N. Jagadeesha. The abstract reads:
Continuing to estimate the Direction-of-arrival (DOA) of the signals impinging on the antenna array, even when a few elements of the underlying Uniform Linear Antenna Array (ULA) fail to work will be of practical interest in RADAR, SONAR and Wireless Radio Communication Systems. This paper proposes a new technique to estimate the DOAs when a few elements are malfunctioning. The technique combines Singular Value Thresholding (SVT) based Matrix Completion (MC) procedure with the Direct Data Domain (D^3) based Matrix Pencil (MP) Method. When the element failure is observed, first, the MC is performed to recover the missing data from failed elements, and then the MP method is used to estimate the DOAs. We also, propose a very simple technique to detect the location of elements failed, which is required to perform MC procedure. We provide simulation studies to demonstrate the performance and usefulness of the proposed technique. The results indicate a better performance, of the proposed DOA estimation scheme under different antenna failure scenarios.
Credit: NASA/JPL/University of Arizona

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.