## Tuesday, August 17, 2010

### CS: The long post of the week.

For the past two weeks, we have seen an increase of 30 people joining the Linkedin compressive sensing group. Welcome aboard. to the new members and kudos to the new managers. I look forward to somebody answering Robert's question.

Following up on yesterday's entry about FROG, it looks like Justin and Rick are going to talk. Cool!

Muthu has a very nice blog entry on a subject of considerable importance: tracking research. I wish I could be as optimist as he is when he says:
This has been done nicely for say smoothed analysis or compressed sensing. But ultimately, it will be great if the community can collaborate and develop such sites organically.
Emphasis mine. I think that one should hope that this collaboration take place, however, since this type of outreach is not counting toward a higher paying job or a more secure one., I am not holding my breath.  Maybe I should write something on my experience so far. In the meantime, Muthu and Graham Cormode have a started a very interesting wiki on Count-Min sketch. Check it out as we have seen this technique in Compressed Sensing before.

On a related note, the main page on the subject of Matrix Completion is here.

Here is our new list of papers of the week, enjoy!

Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose we have p covariates and that under the alternative, the response only depends upon on the order of p1− of those, 0 \le α \le 1. Under moderate sparsity levels, i.e. 0 \le α \le 1/2, we show that ANOVA is essentially optimal under some conditions on the design. This is no longer the case under strong sparsity constraints, i.e. α > 1/2. In such settings, a multiple comparison procedure is often preferred and we establish its optimality when α \ge 3/4. However, these two very popular methods are suboptimal, and sometimes powerless, under moderately strong sparsity where 1/2 \lt α \lt 3/4. We suggest a method based on the Higher Criticism that is powerful in the whole range α \gt 1/2. This optimality property is true for a variety of designs, including the classical (balanced) multi-way designs and more modern ‘p \gt n’ designs arising in genetics and signal processing. In addition to the standard fixed effects model, we establish similar results for a random effects model where the nonzero coefficients of the regression vector are normally distributed.

Compressive sampling (CS) is a new research topic in signal processing that has piqued the interest of a wide range of researchers in different fields recently. In this paper, we present a CS-based classifier for music genre classification, with two sets of features, including short-time and long-time features of audio music. The proposed classifier generates a compact signature to achieve a significant reduction in the dimensionality of the audio music signals. The experimental results demonstrate that the computation time of the CS-based classifier is only about 20% of SVM on GTZAN dataset, with an accuracy of 92.7%. Several experiments were conducted in this study to illustrate the feasibility and robustness of the proposed methods as compared to other approaches.

Compressive Sensing (CS) has recently emerged as a powerful paradigm to efficiently measure high-dimensional signals. The key property of the signal to be recovered is sparsity: although the dimension of an unknown signal may be high (say n) is it known that a basis representation exists in which the signal can be expressed using a small number of nonzero basis coefficients. It has been shown that recovery of sparse signals is possible using much fewer than n measurements. Furthermore, recovery is possible by solving a convex optimization problem, and is robust to measurement noise.
However, most current results are for unstructured, random measurements, while many applications will require the measurements to have a particular structure. For example, in system identification, the measurement process involves the time-domain convolution of the (unknown) system impulse response with the (known) input, and is equivalent to a Toeplitz structured measurement matrix. In this talk, we review the ideas of compressive signal processing and describe how they can be extended to problems involving system identification and fault detection of dynamic systems.

It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al. \cite{LARS}), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: 1) the local geometry of samples is well preserved for low dimensional data representation, 2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, 3) the projection matrix of MEN improves the parsimony in computation, 4) the elastic net penalty reduces the over-fitting problem, and 5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms
.But also on arxiv:
Compressed sensing is a new scheme which shows the ability to recover sparse signal from fewer measurements, using $l_1$ minimization. Recently, Chartrand and Staneva shown in \cite{CS1} that the $l_p$ minimization with $0 \lt p \lt 1$ recovers sparse signals from fewer linear measurements than does the $l_1$ minimization. They proved that $l_p$ minimization with $0 \lt p \lt 1$ recovers $S$-sparse signals $x\in\RN$ from fewer Gaussian random measurements for some smaller $p$ with probability exceeding $$1 - 1 / {N\choose S}.$$ The first aim of this paper is to show that above result is right for the case of random,Gaussian measurements with probability exceeding $1-2e^{-c(p)M},$ where $M$ is the numbers of rows of random, Gaussian measurements and $c(p)$ is a positive constant that guarantees $1-2e^{-c(p)M} \gt 1 - 1 / {N\choose S}$ for $p$ smaller. The second purpose of the paper is to show that under certain weaker conditions, decoders $\triangle_p$ are stable in the sense that they are $(2,p)$ instance optimal for a large class of encoder for $0 \lt p \lt 1.$

Identification of time-varying linear systems, which introduce both time-shifts (delays) and frequency-shifts (Doppler-shifts) to the input signal, is one of the central tasks in many engineering applications. This paper studies the problem of identification of "underspread linear systems," defined as time-varying linear systems whose responses lie within a unit-area region in the delay--Doppler space, by probing them with a single known input signal and analyzing the resulting system output. One of the main contributions of the paper is that it characterizes the conditions on the temporal support and the bandwidth of the input signal that ensure identification of underspread linear systems described by a set of discrete delays and Doppler-shifts---and referred to as parametric underspread linear systems---from single observations. In particular, the paper establishes that sufficiently underspread parametric linear systems are identifiable as long as the time--bandwidth product of the input signal is proportional to the square of the total number of delay--Doppler pairs in the system. In addition, an algorithm is developed in the paper that enables identification of parametric underspread linear systems in polynomial time. Finally, application of these results to super-resolution target detection using radar is discussed. Specifically, it is shown that the procedure developed in the paper allows to distinguish between multiple targets with very close proximity in the delay--Doppler space and results in resolution that substantially exceeds those of standard matched-filtering-based techniques and the recently proposed compressed sensing-based methods.

Compressed sensing is a novel technique where one can recover sparse signals from the undersampled measurements. This paper studies a $K \times N$ partial Fourier measurement matrix for compressed sensing which is deterministically constructed via cyclic difference sets (CDS). Precisely, the matrix is constructed by $K$ rows of the $N\times N$ inverse discrete Fourier transform (IDFT) matrix, where each row index is from a $(N, K, \lambda)$ cyclic difference set. The restricted isometry property (RIP) is statistically studied for the deterministic matrix to guarantee the recovery of sparse signals. A computationally efficient reconstruction algorithm is then proposed from the structure of the matrix. Numerical results show that the reconstruction algorithm presents competitive recovery performance with allowable computational complexity.
his paper introduces a novel framework for compressive sensing of biomedical ultrasonic signals based on modelling data with stable distributions. We propose an approach to p norm minimisation that employs the iteratively reweighted least squares (IRLS) algorithm but in which the parameter p is judiciously chosen by relating it to the characteristic exponent of the underlying alpha-stable distributed data. Our results show that the proposed algorithm, which we prefer to call SaS-IRLS, outperforms previously proposed 1 minimisation algorithms, such as basis pursuit or orthogonal matching pursuit, both visually and in terms of PSNR.

The vast user-provided image tags on the popular photo sharing websites may greatly facilitate image retrieval and management. However, these tags are often imprecise and/or incomplete, resulting in unsatisfactory performances in tag related applications. In this work, the tag refinement problem is formulated as a decomposition of the user-provided tag matrix D into a low-rank refined matrix A and a sparse error matrix E, namely D = A + E, targeting the optimality measured by four aspects: 1) low-rank: A is of low-rank owing to the semantic correlations among the tags; 2) content consistency: if two images are visually similar, their tag vectors (i.e., column vectors of A) should also be similar; 3) tag correlation: if two tags co-occur with high frequency in general images, their co-occurrence frequency (described by two row vectors of A) should also be high; and 4) error sparsity: the matrix E is sparse since the tag matrix D is sparse and also humans can provide reasonably accurate tags. All these components finally constitute a constrained yet convex optimization problem, and an efficient convergence provable iterative procedure is proposed for the optimization based on accelerated proximal gradient method. Extensive experiments on two benchmark Flickr datasets, with 25K and 270K images respectively, well demonstrate the effectiveness of the proposed tag refinement approach.
Decomposing Background Topics from Keywords by Principal Component Pursuit by Kerui Min, Zhengdong Zhang, John Wright, Yi Ma. The asbtract reads
Low-dimensional topic models have been proven very useful for modeling a large corpus of documents that share a relatively small number of topics. Dimensionality reduction tools such as Principal Component Analysis or Latent Semantic Indexing (LSI) have been widely adopted for document modeling, analysis, and retrieval. In this paper, we contend that a more pertinent model for a document corpus as the combination of an (approximately) low-dimensional topic model for the corpus and a sparse model for the keywords of individual documents. For such a joint topic-document model, LSI or PCA is no longer appropriate to analyze the corpus data. We hence introduce a powerful new tool called Principal Component Pursuit that can effectively decompose the low-dimensional and the sparse components of such corpus data. We give empirical results on data synthesized with a Latent Dirichlet Allocation (LDA) mode to validate the new model. We then show that for real document data analysis, the new tool significantly reduces the perplexity and improves retrieval performance compared to classical baselines.

Image Credit: NASA/JPL/Space Science Institute, N00161113.jpg was taken on August 14, 2010 and received on Earth August 15, 2010. The camera was pointing toward ENCELADUS at approximately 45,241 kilometers away, and the image was taken using the CL1 and GRN filters.