Friday, February 01, 2013

This (Past) Month in Compressive Sensing and Matrix Factorization

We just heard the news that InView Expands Compressive Sensing Workstations Series with Two New Products and here are some preprints and papers on compressive sensing and Matrix factorization that showed up, mostly on the Arxiv/Google radar screen. I may come back to some of them later. They are in no particular order:

First some slides on collaborative filtering at Spotify.

Two-stage through-the-wall radar image formation using compressive sensing by Van Ha Tang, Abdesselam Bouzerdoum, Son Lam Phung

Abstract. We introduce a robust image-formation approach for through-the-wall radar imaging (TWRI). The proposed approach consists of two stages involving compressive sensing (CS) followed by delay-and-sum (DS) beamforming. In the first stage, CS is used to reconstruct a complete set of measurements from a small subset collected with a reduced number of transceivers and frequencies. DS beamforming is then applied to form the image using the reconstructed measurements. To promote sparsity of the CS solution, an overcomplete Gabor dictionary is employed to sparsely represent the imaged scene. The new approach requires far fewer measurement samples than the conventional DS beamforming and CSbased TWRI methods to reconstruct a high-quality image of the scene. Experimental results based on simulated and real data demonstrate the effectiveness and robustness of the proposed two-stage image formation technique, especially when the measurement set is drastically reduced. 

The Capacity of Adaptive Group Testing

We define capacity for group testing problems and deduce bounds for the capacity of a variety of noisy models, based on the capacity of equivalent noisy communication channels. For noiseless adaptive group testing we prove an information-theoretic lower bound which tightens a bound of Chan et al. This can be combined with a performance analysis of a version of Hwang's adaptive group testing algorithm, in order to deduce the capacity of noiseless and erasure group testing models.

Statistical mechanics of complex neural systems and high dimensional data

Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? And second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction, and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks.

A significance test for the lasso

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the {\it covariance test statistic}, and show that when the true model is linear, this statistic has an $\Exp(1)$ asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result assumes some (reasonable) regularity conditions on the predictor matrix $X$, and covers the important high-dimensional case $p>n$. Of course, for testing the significance of an additional variable between two nested linear models, one may use the usual chi-squared test, comparing the drop in residual sum of squares (RSS) to a $\chi^2_1$ distribution. But when this additional variable is not fixed, but has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than $\chi^2_1$ under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter $\lambda$ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the $\ell_1$ penalty. Therefore the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties---adaptivity and shrinkage---and its null distribution is tractable and asymptotically $\Exp(1)$.

Spatial degrees of freedom of MIMO systems in Line-of-Sight Environment

While the efficiency of MIMO transmissions in a rich scattering environment has been demonstrated, less is known about the situation where the fading matrix coefficients come from a line-of-sight model. In this paper, we study in detail how this line-of-sight assumption affects the performance of distributed MIMO transmissions between far away clusters of nodes in a wireless network. Our analysis pertains to the study of a new class of random matrices.

The varying w spread spectrum effect for radio interferometric imaging

We study the impact of the spread spectrum effect in radio interferometry on the quality of image reconstruction. This spread spectrum effect will be induced by the wide field-of-view of forthcoming radio interferometric telescopes. The resulting chirp modulation improves the quality of reconstructed interferometric images by increasing the incoherence of the measurement and sparsity dictionaries. We extend previous studies of this effect to consider the more realistic setting where the chirp modulation varies for each visibility measurement made by the telescope. In these first preliminary results, we show that for this setting the quality of reconstruction improves significantly over the case without chirp modulation and achieves almost the reconstruction quality of the case of maximal, constant chirp modulation.

Spike detection from inaccurate samplings

Jean-Marc Azais (IMT), Yohann De Castro (LM-Orsay), Fabrice Gamboa (IMT, - Méthodes d'Analyse Stochastique des Codes et Traitements Numériques)
This article investigates the super-resolution phenomenon using the celebrated statistical estimator LASSO in the complex valued measure framework. More precisely, we study the recovery of a discrete measure (spike train) from few noisy observations (Fourier samples, moments, Stieltjes transformation...). In particular, we provide an explicit quantitative localization of the spikes. Moreover, our analysis is based on the Rice method and provide an upper bound on the supremum of white noise perturbation in the measure space.

A Potential Theory of General Spatially-Coupled Systems via a Continuum Approximation

This paper analyzes general spatially-coupled (SC) vector systems with multi-dimensional coupling. A continuum approximation is used to derive potential functions that characterize the performance of the SC systems. For one-dimensional and two-dimensional coupling, it is shown that, if the boundary of the SC systems is fixed to the unique stable solution that minimizes the potential over all stationary solutions, the systems can approach the optimal performance as the number of coupled systems tends to infinity. Furthermore, a physical intuition is presented for this result on the basis of the conservation law of mechanical energy in Newton mechanics.

Spatial Coupling as a Proof Technique

The aim of this paper is to show that spatial coupling can be viewed not only as a means to build better graphical models, but also as a tool to better understand uncoupled models. The starting point is the observation that some asymptotic properties of graphical models are easier to prove in the case of spatial coupling. In such cases, one can then use the so-called interpolation method to transfer results known for the spatially coupled case to the uncoupled one.
In a first paper last year, we have successfully implemented this strategy for the relatively simple case of LDPC ensembles where the variable node degree distribution is Poisson. In the current paper we now show how to treat the practically much more relevant case where the node degrees of the graphs underlying the ensembles have an arbitrary distribution. In particular, regular ensembles fall within this framework. As we will see, a number of technical difficulties appear when compared to the simpler case of Poisson-distributed degrees. As is common in most of the analysis of iterative coding, a special type of symmetry in the systems needs to be present for our arguments to work. For linear codes, this symmetry follows from the channel symmetry; in general the required symmetry is called Nishimori symmetry.
The main application of this framework is to LDPC codes, where we use interpolation to show that the average entropy of the codeword conditioned on the observation is asymptotically the same for spatially coupled as for uncoupled ensembles. We use this fact to prove the so-called Maxwell conjecture for ensembles with arbitrary degree distributions.

Incorporating Betweenness Centrality in Compressive Sensing for Congestion Detection

This paper presents a new Compressive Sensing (CS) scheme for detecting network congested links. We focus on decreasing the required number of measurements to detect all congested links in the context of network tomography. We have expanded the LASSO objective function by adding a new term corresponding to the prior knowledge based on the relationship between the congested links and the corresponding link Betweenness Centrality (BC). The accuracy of the proposed model is verified by simulations on two real datasets. The results demonstrate that our model outperformed the state-of-the-art CS based method with significant improvements in terms of F-Score.

Multiscale Decompositions and Optimization

In this paper, the following type Tikhonov regularization problem will be systematically studied: [(u_t,v_t):=\argmin_{u+v=f} {|v|_X+t|u|_Y},] where $Y$ is a smooth space such as a $\BV$ space or a Sobolev space and $X$ is the pace in which we measure distortion. Examples of the above problem occur in denoising in image processing, in numerically treating inverse problems, and in the sparse recovery problem of compressed sensing. It is also at the heart of interpolation of linear operators by the real method of interpolation. We shall characterize of the minimizing pair $(u_t,v_t)$ for $(X,Y)=(L_2(\Omega),\BV(\Omega))$ as a primary example and generalize Yves Meyer's result in [11] and Antonin Chambolle's result in [6]. After that, the following multiscale decomposition scheme will be studied: [u_{k+1}:=\argmin_{u\in \BV(\Omega)\cap L_2(\Omega)} {1/2|f-u|^2_{L_2}+t_{k}|u-u_k|_{\BV}},] where $u_0=0$ and $\Omega$ is a bounded Lipschitz domain in $\R^d$. This method was introduced by Eitan Tadmor et al. and we will improve the $L_2$ convergence result in \cite{Tadmor}. Other pairs such as $(X,Y)=(L_p,W^{1}(L_\tau))$ and $(X,Y)=(\ell_2,\ell_p)$ will also be mentioned. In the end, the numerical implementation for $(X,Y)=(L_2(\Omega),\BV(\Omega))$ and the corresponding convergence results will be given.

Reconstruction Guarantee Analysis of Binary Measurement Matrices Based on Girth

Binary 0-1 measurement matrices, especially those from coding theory, were introduced to compressed sensing (CS) recently. Good measurement matrices with preferred properties, e.g., the restricted isometry property (RIP) and nullspace property (NSP), have no known general ways to be efficiently checked. Khajehnejad \emph{et al.} made use of \emph{girth} to certify the good performances of sparse binary measurement matrices. In this paper, we examine the performance of binary measurement matrices with uniform column weight and arbitrary girth under basis pursuit. Explicit sufficient conditions of exact reconstruction %only including $\gamma$ and $g$ are obtained, which improve the previous results derived from RIP for any girth $g$ and results from NSP when $g/2$ is odd. Moreover, we derive explicit $l_1/l_1$, $l_2/l_1$ and $l_\infty/l_1$ sparse approximation guarantees. These results further show that large girth has positive impacts on the performance of binary measurement matrices under basis pursuit, and the binary parity-check matrices of good LDPC codes are important candidates of measurement matrices.

Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a QS function to be interpreted as the negative log of a probability density, and providing the foundation for statistical interpretation and analysis of QS loss functions. For a subclass of QS functions called piecewise linear quadratic (PLQ) penalties, we also develop efficient numerical estimation schemes. These components form a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for inference. In particular, for PLQ densities, interior point (IP) methods can be used. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaussian errors are assumed in the process and measurement models. The extended framework allows arbitrary PLQ densities to be used, and the proposed IP approach solves the generalized Kalman smoothing problem while maintaining the linear complexity in the size of the time series, just as in the Gaussian case. This extends the computational efficiency of classic algorithms to a much broader nonsmooth setting, and includes many recently proposed robust and sparse smoothers as special cases.

Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory

We consider linear regression in the high-dimensional regime in which the number of observations $n$ is smaller than the number of parameters $p$. A very successful approach in this setting uses $\ell_1$-penalized least squares (a.k.a. the Lasso) to search for a subset of $s_0< n$ parameters that best explain the data, while setting the other parameters to zero. A considerable amount of work has been devoted to characterizing the estimation and model selection problems within this approach.
In this paper we consider instead the fundamental, but far less understood, question of statistical significance.
We study this problem under the random design model in which the rows of the design matrix are i.i.d. and drawn from a high-dimensional Gaussian distribution. This situation arises, for instance, in learning high-dimensional Gaussian graphical models. Leveraging on an asymptotic distributional characterization of regularized least squares estimators, we develop a procedure for computing p-values and hence assessing statistical significance for hypothesis testing. We characterize the statistical power of this procedure, and evaluate it on synthetic and real data, comparing it with earlier proposals. Finally, we provide an upper bound on the minimax power of tests with a given significance level and show that our proposed procedure achieves this bound in case of design matrices with i.i.d. Gaussian entries.

Tree structured sparse coding on cubes

A brief description of tree structured sparse coding on the binary cube.

On the Identifiability of Overcomplete Dictionaries via the Minimisation Principle Underlying K-SVD

This article gives theoretical insights into the performance of K-SVD, a dictionary learning algorithm that has gained significant popularity in practical applications. The particular question studied here is when a dictionary $\dico$ can be recovered as local minimum of the minimisation criterion underlying K-SVD from a set of training signals $y_i=\dico x_i$. A theoretical analysis of the problem leads to the following identifiability result in the limiting case. Assuming the training signals are generated from a tight frame with coefficients drawn from a random symmetric distribution, then in expectation the generating dictionary can be recovered as a local minimum of the K-SVD criterion if the coefficient distribution exhibits sufficient decay. This decay can be characterised by the coherence of the dictionary and the $\ell_1$-norm of the coefficients. Further it is demonstrated that a similar result holds also assuming small bounded white noise on the training signals.

Sparse Recovery with Coherent Tight Frame via Analysis Dantzig Selector and Analysis LASSO

This article considers recovery of signals that are sparse or approximately sparse in terms of a (possibly) highly overcomplete and coherent tight frame from undersampled data corrupted with additive noise. We show that the properly constrained $l_1$-analysis, called analysis Dantzig selector, stably recovers a signal which is nearly sparse in terms of a tight frame provided that the measurement matrix satisfies a restricted isometry property adapted to the tight frame. As a special case, we consider the Gaussian noise. Further, under a sparsity scenario, with high probability, the recovery error from noisy data is within a log-like factor of the minimax risk over the class of vectors which is at most $s$ sparse in terms of the tight frame. Similar results for the analysis LASSO are showed.
The above two algorithms provide guarantees only for noise that is bounded or bounded with high probability (for example, Gaussian noise). However, when the underlying measurements are corrupted by sparse noise, these algorithms perform suboptimally. We demonstrate robust methods for reconstructing signals that are nearly sparse in terms of a tight frame in the presence of bounded noise combined with sparse noise. The analysis in this paper is based on the restricted isometry property adapted to a tight frame, which is a natural extension to the standard restricted isometry property.

Matrix Approximation under Local Low-Rank Assumption

Matrix approximation is a common tool in machine learning for building accurate prediction models for recommendation systems, text mining, and computer vision. A prevalent assumption in constructing matrix approximations is that the partially observed matrix is of low-rank. We propose a new matrix approximation model where we assume instead that the matrix is only locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices. We analyze the accuracy of the proposed local low-rank modeling. Our experiments show improvements in prediction accuracy in recommendation tasks.

Beyond consistent reconstructions: optimality and sharp bounds for generalized sampling, and application to the uniform resampling problem

Generalized sampling is a recently developed linear framework for sampling and reconstruction in separable Hilbert spaces. It allows one to recover any element in any finite-dimensional subspace given finitely many of its samples with respect to an arbitrary frame. Unlike more common approaches for this problem, such as the consistent reconstruction technique of Eldar et al, it leads to completely stable numerical methods possessing both guaranteed stability and accuracy.
The purpose of this paper is twofold. First, we give a complete and formal analysis of generalized sampling, the main result of which being the derivation of new, sharp bounds for the accuracy and stability of this approach. Such bounds improve those given previously, and result in a necessary and sufficient condition, the stable sampling rate, which guarantees a priori a good reconstruction. Second, we address the topic of optimality. Under some assumptions, we show that generalized sampling is an optimal, stable reconstruction. Correspondingly, whenever these assumptions hold, the stable sampling rate is a universal quantity. In the final part of the paper we illustrate our results by applying generalized sampling to the so-called uniform resampling problem.

Robust High Dimensional Sparse Regression and Matching Pursuit

We consider high dimensional sparse regression, and develop strategies able to deal with arbitrary -- possibly, severe or coordinated -- errors in the covariance matrix $X$. These may come from corrupted data, persistent experimental errors, or malicious respondents in surveys/recommender systems, etc. Such non-stochastic error-in-variables problems are notoriously difficult to treat, and as we demonstrate, the problem is particularly pronounced in high-dimensional settings where the primary goal is {\em support recovery} of the sparse regressor. We develop algorithms for support recovery in sparse regression, when some number $n_1$ out of $n+n_1$ total covariate/response pairs are {\it arbitrarily (possibly maliciously) corrupted}. We are interested in understanding how many outliers, $n_1$, we can tolerate, while identifying the correct support. To the best of our knowledge, neither standard outlier rejection techniques, nor recently developed robust regression algorithms (that focus only on corrupted response variables), nor recent algorithms for dealing with stochastic noise or erasures, can provide guarantees on support recovery. Perhaps surprisingly, we also show that the natural brute force algorithm that searches over all subsets of $n$ covariate/response pairs, and all subsets of possible support coordinates in order to minimize regression error, is remarkably poor, unable to correctly identify the support with even $n_1 = O(n/k)$ corrupted points, where $k$ is the sparsity. This is true even in the basic setting we consider, where all authentic measurements and noise are independent and sub-Gaussian. In this setting, we provide a simple algorithm -- no more computationally taxing than OMP -- that gives stronger performance guarantees, recovering the support with up to $n_1 = O(n/(\sqrt{k} \log p))$ corrupted points, where $p$ is the dimension of the signal to be recovered.

Learning to Optimize Via Posterior Sampling

This paper considers the use of a simple posterior sampling algorithm for handling the exploration-exploitation trade-off when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling, offers significant potential advantages over other popular approaches, and can be naturally applied to problems with infinite action spaces and complicated relationships among the rewards generated by different actions. We provide an analysis of posterior sampling in a general framework, making two theoretical contributions. The first establishes a connection between posterior sampling and upper-confidence bound (UCB) algorithms. For specific classes of models, this result lets us convert regret bounds developed for specific UCB algorithms into Bayes risk bounds for posterior sampling. Our second theoretical contribution is a Bayes risk bound for posterior sampling that applies broadly and can be specialized to many model classes. This bound depends on a new notion of dimension that measures the degree of dependence among action rewards. We also present simulation results showing that posterior sampling significantly outperforms several recently proposed UCB algorithms.

Lecture notes on non-asymptotic theory of random matrices

We discuss recent developments in the study of the spectral properties of random matrices of a large fixed size, concentrating on the extreme singular values. Bounds for the extreme singular values were crucial in establishing several limit laws of random matrix theory. Besides the random matrix theory itself, these bounds have applications in geometric functional analysis and computer science.

Distributed soft thresholding for sparse signal recovery

In this paper, we address the problem of distributed sparse recovery of signals acquired via compressed measurements in a sensor network. We propose a new class of distributed algorithms to solve Lasso regression problems, when the communication to a gateway node is not possible, e.g., due to communication cost or privacy reasons. More precisely, we introduce a distributed iterative soft thresholding algorithm (DISTA) that consists of three steps: an averaging step, a subgradient step, and a soft thresholding operation. We prove the convergence of DISTA in a network represented by a complete graph, and we show that it outperforms existing algorithms in terms of performance and complexity.

Statistical mechanics approach to 1-bit compressed sensing

Compressed sensing is a technique for recovering a high-dimensional signal from lower-dimensional data, whose components represent partial information about the signal, utilizing prior knowledge on the sparsity of the signal. For further reducing the data size of the compressed expression, a scheme to recover the original signal utilizing only the sign of each entry of the linearly transformed vector was recently proposed. This approach is often termed the 1-bit compressed sensing. Here we analyze the typical performance of an L1-norm based signal recovery scheme for the 1-bit compressed sensing using statistical mechanics methods. We show that the signal recovery performance predicted by the replica method under the replica symmetric ansatz, which turns out to be locally unstable for modes breaking the replica symmetry, is in a good consistency with experimental results of an approximate recovery algorithm developed earlier. This suggests that the L1-based recovery problem typically has many local optima of a similar recovery accuracy, which can be achieved by the approximate algorithm. We also develop another approximate recovery algorithm inspired by the cavity method. Numerical experiments show that when the density of nonzero entries in the original signal is relatively large the new algorithm offers better performance than the abovementioned scheme and does so with a lower computational cost.

Analysis of weighted $\ell_1$-minimization for model based compressed sensing

Model-based compressed sensing refers to compressed sensing with extra structure about the underlying sparse signal known a priori. Recent work has demonstrated that both for deterministic and probabilistic models imposed on the signal, this extra information can be successfully exploited to enhance recovery performance. In particular, weighted $\ell_1$-minimization with suitable choice of weights has been shown to improve performance at least for a simple class of probabilistic models. In this paper, we consider a more general and natural class of probabilistic models where the underlying probabilities associated with the indices of the sparse signal have a continuously varying nature. We prove that when the measurements are obtained using a matrix with i.i.d Gaussian entries, weighted $\ell_1$-minimization with weights that have a similar continuously varying nature successfully recovers the sparse signal from its measurements with overwhelming probability. It is known that standard $\ell_1$-minimization with uniform weights can recover sparse signals up to a known sparsity level, or expected sparsity level in case of a probabilistic signal model. With suitable choice of weights which are chosen based on our signal model, we show that weighted $\ell_1$-minimization can recover signals beyond the sparsity level achievable by standard $\ell_1$-minimization.

A Multi-GPU Programming Library for Real-Time Applications

We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.

GESPAR: Efficient Phase Retrieval of Sparse Signals

We consider the problem of one dimensional (1D) phase retrieval, namely, recovery of a 1D signal from the magnitude of its Fourier transform. This problem is ill-posed since the Fourier phase information is lost. Therefore, prior information on the signal is needed in order to recover it. In this work we consider the case in which the prior information on the signal is that it is sparse, i.e., it consists of a small number of nonzero elements. We propose a fast local search method for recovering a sparse 1D signal from measurements of its Fourier transform magnitude. Our algorithm does not require matrix lifting, unlike previous approaches, and therefore is potentially suitable for large scale problems such as images. Simulation results indicate that the proposed algorithm is fast and more accurate than existing techniques.

Fast and RIP-optimal transforms

We study constructions of $k \times n$ matrices $A$ that both (1) satisfy the restricted isometry property (RIP) at sparsity $s$ with optimal parameters, and (2) are efficient in the sense that only $O(n\log n)$ operations are required to compute $Ax$ given a vector $x$. Our construction is based on repeated application of independent transformations of the form $DH$, where $H$ is a Hadamard or Fourier transform and $D$ is a diagonal matrix with random $\{+1,-1\}$ elements on the diagonal, followed by any $k \times n$ matrix of orthonormal rows (e.g.\ selection of $k$ coordinates). We provide guarantees (1) and (2) for a larger regime of parameters for which such constructions were previously unknown. Additionally, our construction does not suffer from the extra poly-logarithmic factor multiplying the number of observations $k$ as a function of the sparsity $s$, as present in the presently best known RIP estimates for partial random Fourier matrices and other classes of structured random matrices.

Eventual linear convergence of the Douglas Rachford iteration for basis pursuit

We provide a simple analysis of the Douglas Rachford splitting algorithm in the context of $\ell^1$ minimization with linear constraints, which explains why the asymptotic convergence rate is linear. The convergence rate is given in terms of principal angles between relevant vector spaces, and does not depend on the shrinkage parameter. In the compressed sensing setting, we show how to bound this rate in terms of the restricted isometry constant. More general iterative schemes obtained by $\ell^2$-regularization and over-relaxation are also treated. We make no attempt at characterizing the transient regime preceding the onset of linear convergence.

Chaotic Modulation for Analog-to-Information Conversion

Chaotic compressive sensing is a nonlinear framework for compressive sensing. Along the framework, this paper proposes a chaotic analog-to-information converter, chaotic modulation, to acquire and reconstruct band-limited sparse analog signals at sub-Nyquist rate. In the chaotic modulation, the sparse signal is randomized through parameter modulation of continuous-time chaotic system and one state output is sampled as compressive measurements. The reconstruction is achieved through the estimation of the sparse coefficients with principle of chaotic impulsive synchronization and lp-norm regularized nonlinear least squares. The concept of supreme local Lyapunov exponents (SLLE) is introduced to study the reconstructablity. It is found that the sparse signals are reconstructable, if the largest SLLE of the error dynamical system is negative. As an example, the Lorenz system excited by the sparse multi-tone signals is taken to illustrate the principle and the performance.

Compressed Sensing Matrices from Fourier Matrices

The class of Fourier matrices is of special importance in compressed sensing (CS). This paper concerns deterministic construction of compressed sensing matrices from Fourier matrices. By using Katz' character sum estimation, we are able to design a deterministic procedure to select rows from a Fourier matrix to form a good compressed sensing matrix for sparse recovery. The sparsity bound in our construction is similar to that of binary CS matrices constructed by DeVore which greatly improves previous results for CS matrices from Fourier matrices. Our approach also provides more flexibilities in terms of the dimension of CS matrices. As a consequence, our construction yields an approximately mutually unbiased bases from Fourier matrices which is of particular interest to quantum information theory. This paper also contains a useful improvement to Katz' character sum estimation for quadratic extensions, with an elementary and transparent proof. Some numerical examples are included.

 Quadratic Compressive Sensing — A Nonlinear Extension of Compressive Sensing by Henrik Ohlsson,, Allen Y. Yang, Roy Dong, Michel Verhaegen, S. Shankar Sastry, The abstarct reads:
Abstract—In many compressive sensing problems today, the relationship between the measurements and the unknowns could be nonlinear. Traditional treatment of such nonlinear relationship has been to approximate the nonlinearity via a linear model and the subsequent un-modeled dynamics as noise. The ability to more accurately characterize nonlinear models has the potential to improve the results in both existing compressive sensing applications and those where a linear approximation does not suffice, e.g., phase retrieval. In this paper, we extend the classical compressive sensing framework to a second-order Taylor expansion of the nonlinearity. Using a lifting technique, we show that the sparse signal can be recovered exactly when the sampling rate is sufficiently high. We further present efficient numerical algorithms to recover sparse signals in second-order nonlinear systems, which are still considerably more difficult to solve than their linear counterparts in sparse optimization.

A projector-splitting integrator for dynamical low-rank approximation

The dynamical low-rank approximation of time-dependent matrices is a low-rank factorization updating technique. It leads to differential equations for factors of the matrices, which need to be solved numerically. We propose and analyze a fully ex- plicit, computationally inexpensive integrator that is based on splitting the orthogonal projector onto the tangent space of the low-rank manifold. As is shown by theory and illustrated by numerical experiments, the integrator enjoys robustness properties that are not shared by any standard numerical integrator. This robustness can be exploited to change the rank adaptively. Another application is in optimization algorithms for low-rank matrices where truncation back to the given low rank can be done efficiently by applying a step of the integrator proposed here.

Blind source separation methods for deconvolution of complex signals in cancer biology

Two blind source separation methods (Independent Component Analysis and Non-negative Matrix Factorization), developed initially for signal processing in engineering, found recently a number of applications in analysis of large-scale data in molecular biology. In this short review, we present the common idea behind these methods, describe ways of implementing and applying them and point out to the advantages compared to more traditional statistical approaches. We focus more specifically on the analysis of gene expression in cancer. The review is finalized by listing available software implementations for the methods described.

Matrix Approximation under Local Low-Rank Assumption

Matrix approximation is a common tool in machine learning for building accurate prediction models for recommendation systems, text mining, and computer vision. A prevalent assumption in constructing matrix approximations is that the partially observed matrix is of low-rank. We propose a new matrix approximation model where we assume instead that the matrix is only locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices. We analyze the accuracy of the proposed local low-rank modeling. Our experiments show improvements in prediction accuracy in recommendation tasks.

The Diagonalized Newton Algorithm for Nonnegative Matrix Factorization

Non-negative matrix factorization (NMF) has become a popular machine learning approach to many problems in text mining, speech and image processing, bio-informatics and seismic data analysis to name a few. In NMF, a matrix of non-negative data is approximated by the low-rank product of two matrices with non-negative entries. In this paper, the approximation quality is measured by the Kullback-Leibler divergence between the data and its low-rank reconstruction. The existence of the simple multiplicative update (MU) algorithm for computing the matrix factors has contributed to the success of NMF. Despite the availability of algorithms showing faster convergence, MU remains popular due to its simplicity. In this paper, a diagonalized Newton algorithm (DNA) is proposed showing faster convergence while the implementation remains simple and suitable for high-rank problems. The DNA algorithm is applied to various publicly available data sets, showing a substantial speed-up on modern hardware.

Block Coordinate Descent for Sparse NMF

Nonnegative matrix factorization (NMF) has become a ubiquitous tool for data analysis. An important variant is the sparse NMF problem which arises when we explicitly require the learnt features to be sparse. A natural measure of sparsity is the L$_0$ norm, however its optimization is NP-hard. Mixed norms, such as L$_1$/L$_2$ measure, have been shown to model sparsity robustly, based on intuitive attributes that such measures need to satisfy. This is in contrast to computationally cheaper alternatives such as the plain L$_1$ norm. However, present algorithms designed for optimizing the mixed norm L$_1$/L$_2$ are slow and other formulations for sparse NMF have been proposed such as those based on L$_1$ and L$_0$ norms. Our proposed algorithm allows us to solve the mixed norm sparsity constraints while not sacrificing computation time. We present experimental evidence on real-world datasets that shows our new algorithm performs an order of magnitude faster compared to the current state-of-the-art solvers optimizing the mixed norm and is suitable for large-scale datasets.

Simultaneous computation of the row and column rank profiles

Jean-Guillaume Dumas (LJK), Clément Pernet (INRIA Grenoble Rhône-Alpes / LIG Laboratoire d'Informatique de Grenoble), Ziad Sultan (LJK, INRIA Grenoble Rhône-Alpes / LIG Laboratoire d'Informatique de Grenoble)
Gaussian elimination with full pivoting generates a PLUQ matrix decomposition. Depending on the strategy used in the search for pivots, the permutation matrices can reveal some information about the row or the column rank profiles of the matrix. We propose a new pivoting strategy that makes it possible to recover at the same time both row and column rank profiles of the input matrix and of any of its leading sub-matrices. We propose a rank-sensitive and quad-recursive algorithm that computes the latter PLUQ triangular decomposition of an m \times n matrix of rank r in O(mnr^{\omega-2}) field operations, with \omega the exponent of matrix multiplication. Compared to the LEU decomposition by Malashonock, sharing a similar recursive structure, its time complexity is rank sensitive and has a lower leading constant. Over a word size finite field, this algorithm also improveLs the practical efficiency of previously known implementations.

Regularised PCA to denoise and visualise data

Principal component analysis (PCA) is a well-established method commonly used to explore and visualise data. A classical PCA model is the fixed effect model where data are generated as a fixed structure of low rank corrupted by noise. Under this model, PCA does not provide the best recovery of the underlying signal in terms of mean squared error. Following the same principle as in ridge regression, we propose a regularised version of PCA that boils down to threshold the singular values. Each singular value is multiplied by a term which can be seen as the ratio of the signal variance over the total variance of the associated dimension. The regularised term is analytically derived using asymptotic results and can also be justified from a Bayesian treatment of the model. Regularised PCA provides promising results in terms of the recovery of the true signal and the graphical outputs in comparison with classical PCA and with a soft thresholding estimation strategy. The gap between PCA and regularised PCA is all the more important that data are noisy.

Hierarchical Data Representation Model - Multi-layer NMF

Understanding and representing the underlying structure of feature hierarchies present in complex data in intuitively understandable manner is an important issue. In this paper, we propose a data representation model that demonstrates hierarchical feature learning using NMF with sparsity constraint. We stack simple unit algorithm into several layers to take step-by-step approach in learning. By utilizing NMF as unit algorithm, our proposed network provides intuitive understanding of the learning process. It is able to demonstrate hierarchical feature development process and also discover and represent feature hierarchies in the complex data in intuitively understandable manner. We apply hierarchical multi-layer NMF to image data and document data to demonstrate feature hierarchies present in the complex data. Furthermore, we analyze the reconstruction and classification abilities of our proposed network and prove that hierarchical feature learning approach excels performance of standard shallow network. By providing underlying feature hierarchies in complex real-world data sets, our proposed network is expected to help machines develop intelligence based on the learned relationship between concepts, and at the same time, perform better with the small number of features provided for data representation.

Positive Semidefinite Matrix Completion, Universal Rigidity and the Strong Arnold Property

This paper addresses the following three topics: positive semidefinite (psd) matrix completions, universal rigidity of frameworks, and the Strong Arnold Property (SAP). We show some strong connections among these topics, using semidefinite programming as unifying theme. Our main contribution is a sufficient condition for constructing partial psd matrices which admit a unique completion to a full psd matrix. Such partial matrices are an essential tool in the study of the Gram dimension $\gd(G)$ of a graph $G$, a recently studied graph parameter related to the low psd matrix completion problem. Additionally, we derive an elementary proof of Connelly's sufficient condition for universal rigidity of tensegrity frameworks and we investigate the links between these two sufficient conditions. We also give a geometric characterization of psd matrices satisfying the Strong Arnold Property in terms of nondegeneracy of an associated semidefinite program, which we use to establish some links between the Gram dimension $\gd(\cdot)$ and the Colin de Verdi\`ere type graph 

Clustering-Based Matrix Factorization

Recommender systems are emerging technologies that nowadays can be found in many applications such as Amazon, Netflix, and so on. These systems help users find relevant information, recommendations, and their preferred items. Matrix Factorization is a popular method in Recommendation Systems showing promising results in accuracy and complexity. In this paper we propose an extension of matrix factorization that uses the clustering paradigm to cluster similar users and items in several communities. We then establish their effects on the prediction model then. To the best of our knowledge, our proposed model outperforms all other published recommender methods in accuracy and complexity. For instance, our proposed method's accuracy is 0.8122 on the Netflix dataset which is better than the Netflix prize winner's accuracy of 0.8567.

Load curve data cleansing and imputation via sparsity and low rank

The smart grid vision is to build an intelligent power network with an unprecedented level of situational awareness and controllability over its services and infrastructure. This paper advocates statistical inference methods to robustify power monitoring tasks against the outlier effects owing to faulty readings and malicious attacks, as well as against missing data due to privacy concerns and communication errors. In this context, a novel load cleansing and imputation scheme is developed leveraging the low intrinsic-dimensionality of spatiotemporal load profiles and the sparse nature of "bad data.'' A robust estimator based on principal components pursuit (PCP) is adopted, which effects a twofold sparsity-promoting regularization through an $\ell_1$-norm of the outliers, and the nuclear norm of the nominal load profiles. Upon recasting the non-separable nuclear norm into a form amenable to decentralized optimization, a distributed (D-) PCP algorithm is developed to carry out the imputation and cleansing tasks using networked devices comprising the so-termed advanced metering infrastructure. If D-PCP converges and a qualification inequality is satisfied, the novel distributed estimator provably attains the performance of its centralized PCP counterpart, which has access to all networkwide data. Computer simulations and tests with real load curve data corroborate the convergence and effectiveness of the novel D-PCP algorithm.

Abstract: Techniques based on non-negative matrix factorization (NMF) can be used to efficiently decompose a magnitude spectrogram into a set of template (column) vectors and activation (row) vectors. To better control this decomposition, NMF has been extended using prior knowledge and parametric models. In this paper, we present such an extended approach that uses additional score information to guide the decomposition process. Here, opposed to previous methods, our main idea is to impose constraints on both the template as well as the activation side. We show that using such double constraints results in musically meaningful decompositions similar to parametric approaches, while being computationally less demanding and easier to implement. Furthermore, additional onset constraints can be incorporated in a straightforward manner without sacrificing robustness. We evaluate our approach in the context of separating note groups (e. g. the left or right hand) from monaural piano recordings.

Image Credit: NASA/JPL-Caltech 
This image was taken by Rear Hazcam: Right A (RHAZ_RIGHT_A) onboard NASA's Mars rover Curiosity on Sol 174 (2013-01-31 18:39:59 UTC) . 

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: