Thursday, February 16, 2012

A small op-ed and Advanced Matrix Factorization This Week

I have decided to change the name of the LinkedIn Group on Matrix Factorization to Advanced Matrix Factorization. It is not just a question of branding, we need to send a signal that current matrix factorization techniques mentioned in that group and here are not your grandfather's matrix factorizations. In other words, we don't care about SVDs, we care about very large scale SVDs who can deal with all kinds of "unexpected" issues, we don't care about a slight improvement of NMF, we care about knowing why NMF works and how it can be extended once and for all and when we denoise the endoscopic videos made at Fukushima, it's not because we have better schemes to denoise, it's because we want to use the noise to evaluate radiation levels....We want to make sense of it all.

I noted some interesting entries in Danny Bickson's blog:: 

Since Danny's blog is very relevant to some of the issues of Large Scale Matrix Factorization, I have decided to add his feed directly into the discussions on the LinkedIn Advanced Matrix Factorization group which now boasts more than 180 members

Also found on the interweb, the following papers, enjoy:

High-dimensional tensors or multi-way data are becoming prevalent in areas such as biomedical imaging, chemometrics, networking and bibliometrics. Traditional approaches to finding lower dimensional representations of tensor data include flattening the data and applying matrix factorizations such as principal components analysis (PCA) or employing tensor decompositions such as the CANDECOMP / PARAFAC (CP) and Tucker decompositions. The former can lose important structure in the data, while the latter Higher-Order PCA (HOPCA) methods can be problematic in high-dimensions with many irrelevant features. We introduce frameworks for sparse tensor factorizations or Sparse HOPCA based on heuristic algorithmic approaches and by solving penalized optimization problems related to the CP decomposition. Extensions of these approaches lead to methods for general regularized tensor factorizations, multi-way Functional HOPCA and generalizations of HOPCA for structured data. We illustrate the utility of our methods for dimension reduction, feature selection, and signal recovery on simulated data and multi-dimensional microarrays and functional MRIs.

HOPCA and Sparse HOPCA will be downloadable from at some point in time. When they are, I'll feature them in the Matrix Factorization Jungle page.

We consider the problem of estimating a rank-one matrix in Gaussian noise under a probabilistic model for the left and right factors of the matrix. The probabilistic model can impose constraints on the factors including sparsity and positivity that arise commonly in learning problems. We propose a simple iterative procedure that reduces the problem to a sequence of scalar estimation computations. The method is similar to approximate message passing techniques based on Gaussian approximations of loopy belief propagation that have been used recently in compressed sensing. Leveraging analysis methods by Bayati and Montanari, we show that the asymptotic behavior of the estimates from the proposed iterative procedure is described by a simple scalar equivalent model, where the distribution of the estimates is identical to certain scalar estimates of the variables in Gaussian noise. Moreover, the effective Gaussian noise level is described by a set of state evolution equations. The proposed method thus provides a computationally simple and general method for rank-one estimation problems with a precise analysis in certain high-dimensional settings.

In this paper, we propose a study of performance of the channel estimation using LS, MMSE, LMMSE and Lr-LMMSE algorithms in OFDM (Orthogonal Frequency Division Multiplexing) system which, as known suffers from the time variation of the channel under high mobility conditions, using block pilot insertion. The loss of sub channel orthogonality leads to inter-carrier interference (ICI). Using many algorithms for channel estimation, we will show that, for a 16- QAM modulation, the LMMSE algorithm performs well to achieve this estimation but when the SNR (Signal Noise Rate) is high, the four algorithms (LS, MMSE, LMMSE and Lr-LMMSE) perform similarly, this is not always the case for another scheme of modulation. We will improve also the mean squared error for these algorithms. It will be illustrious in this paper that the LMMSE algorithm performs well with the block- pilot insertion as well as its low rank version which behave very good even when the size of FFT is very high.

We investigate the problem of signal transduction via a descriptive analysis of the spatial organization of the complement of proteins exerting a certain function within a cellular compartment. We propose a scheme to assign a numerical value to individual proteins in a protein interaction network by means of a simple optimization algorithm. We test our procedure against datasets focusing on the proteomes in the neurite and soma compartments.

The ADI iteration is closely related to the rational Krylov projection methods for constructing low rank approximations to the solution of Sylvester equation. In this paper we show that the ADI and rational Krylov approximations are in fact equivalent when a special choice of shifts are employed in both methods. We will call these shifts pseudo H2-optimal shifts. These shifts are also optimal in the sense that for the Lyapunov equation, they yield a residual which is orthogonal to the rational Krylov projection subspace. Via several examples, we show that the pseudo H2-optimal shifts consistently yield nearly optimal low rank approximations to the solutions of the Lyapunov equations.

This paper studies the models of minimizing $||x||_1+1/(2\alpha)||x||_2^2$ where $x$ is a vector, as well as those of minimizing $||X||_*+1/(2\alpha)||X||_F^2$ where $X$ is a matrix and $||X||_*$ and $||X||_F$ are the nuclear and Frobenius norms of $X$, respectively. We show that they can efficiently recover sparse vectors and low-rank matrices. In particular, they enjoy exact and stable recovery guarantees similar to those known for minimizing $||x||_1$ and $||X||_*$ under the conditions on the sensing operator such as its null-space property, restricted isometry property, spherical section property, or RIPless property. To recover a (nearly) sparse vector $x^0$, minimizing $||x||_1+1/(2\alpha)||x||_2^2$ returns (nearly) the same solution as minimizing $||x||_1$ almost whenever $\alpha\ge 10||x^0||_\infty$. The same relation also holds between minimizing $||X||_*+1/(2\alpha)||X||_F^2$ and minimizing $||X||_*$ for recovering a (nearly) low-rank matrix $X^0$, if $\alpha\ge 10||X^0||_2$. Furthermore, we show that the linearized Bregman algorithm for minimizing $||x||_1+1/(2\alpha)||x||_2^2$ subject to $Ax=b$ enjoys global linear convergence as long as a nonzero solution exists, and we give an explicit rate of convergence. The convergence property does not require a solution solution or any properties on $A$. To our knowledge, this is the best known global convergence result for first-order sparse optimization algorithms.

In this paper, we study the problem of high-dimensional approximately low-rank covariance matrix estimation with missing observations. We propose a simple procedure computationally tractable in high-dimension and that does not require imputation of the missing data. We establish non-asymptotic sparsity oracle inequalities for the estimation of the covariance matrix with the Frobenius and spectral norms, valid for any setting of the sample size and the dimension of the observations. We further establish minimax lower bounds showing that our rates are minimax optimal up to a logarithmic factor.

This paper considers the problem of completing a matrix with many missing entries under the assumption that the columns of the matrix belong to a union of multiple low-rank subspaces. This generalizes the standard low-rank matrix completion problem to situations in which the matrix rank can be quite high or even full rank. Since the columns belong to a union of subspaces, this problem may also be viewed as a missing-data version of the subspace clustering problem. Let X be an n x N matrix whose (complete) columns lie in a union of at most k subspaces, each of rank <= r < n, and assume N >> kn. The main result of the paper shows that under mild assumptions each column of X can be perfectly recovered with high probability from an incomplete version so long as at least CrNlog^2(n) entries of X are observed uniformly at random, with C>1 a constant depending on the usual incoherence conditions, the geometrical arrangement of subspaces, and the distribution of columns over the subspaces. The result is illustrated with numerical experiments and an application to Internet distance matrix completion and topology identification.
The knowledge of end-to-end network distances is essential to many Internet applications. As active probing of all pairwise distances is infeasible in large-scale networks, a natural idea is to measure a few pairs and to predict the other ones without actually measuring them. This paper formulates the distance prediction problem as matrix completion where unknown entries of an incomplete matrix of pairwise distances are to be predicted. The problem is solvable because strong correlations among network distances exist and cause the constructed distance matrix to be low rank. The new formulation circumvents the well-known drawbacks of existing approaches based on Euclidean embedding. 
A new algorithm, so-called Decentralized Matrix Factorization by Stochastic Gradient Descent (DMFSGD), is proposed to solve the network distance prediction problem. By letting network nodes exchange messages with each other, the algorithm is fully decentralized and only requires each node to collect and to process local measurements, with neither explicit matrix constructions nor special nodes such as landmarks and central servers. In addition, we compared comprehensively matrix factorization and Euclidean embedding to demonstrate the suitability of the former on network distance prediction. We further studied the incorporation of a robust loss function and of non-negativity constraints. Extensive experiments on various publicly-available datasets of network delays show not only the scalability and the accuracy of our approach but also its usability in real Internet applications.

No comments: