## Thursday, April 21, 2011

### CS: Noise Folding in Compressed Sensing, nonzero entry in the optimizer of L1 norm penalized least-square problem, Random lasso, Improved variable selection with Forward-Lasso adaptive shrinkage

With all the noise about noise, we were bound to have more noise :-) Today we have a new kind of noise featured in Noise Folding in Compressed Sensing by Ery Arias-Castro, Yonina Eldar. The abstract reads:
The literature on compressed sensing has focused almost entirely on settings where the signal is noiseless and the measurements are contaminated by noise. In practice, however, the signal itself is often subject to random noise prior to measurement. We briefly study this setting and show that, for the vast majority of measurement schemes employed in compressed sensing, the two models are equivalent with the important difference that the signal-to-noise ratio is divided by a factor proportional to p/n, where p is the dimension of the signal and n is the number of observations. Since p/n is often large, this leads to noise folding which can have a severe impact on the SNR.
The $\ell$-1 norm based optimization is widely used in signal processing, especially in recent compressed sensing theory. This paper studies the solution path of the $\ell$-1 norm penalized least-square problem, whose constrained form is known as Least Absolute Shrinkage and Selection Operator (LASSO). A solution path is the set of all the optimizers with respect to the evolution of the hyperparameter (Lagrange multiplier). The study of the solution path is of great significance in viewing and understanding the profile of the tradeoff between the approximation and regularization terms. If the solution path of a given problem is known, it can help us to find the optimal hyperparameter under a given criterion such as the Akaike Information Criterion. In this paper we present a sufficient condition on $\ell$-1 norm penalized least-square problem. Under this sufficient condition, the number of nonzero entries in the optimizer or solution vector increases monotonically when the hyperparameter decreases. We also generalize the result to the often used total variation case, where the $\ell$-1 norm is taken over the first order derivative of the solution vector. We prove that the proposed condition has intrinsic connections with the condition given by Donoho, et al \cite{Donoho08} and the positive cone condition by Efron {\it el al} \cite{Efron04}. However, the proposed condition does not need to assume the sparsity level of the signal as required by Donoho et al's condition, and is easier to verify than Efron, et al's positive cone condition when being used for practical applications.
Following up on yesterday's connection between statistics and CS, here are two Lasso papers: Random lasso by Sijian Wang, Bin Nan, Saharon Rosset, Ji Zhu. The abstract reads:
We propose a computationally intensive method, the random lasso method, for variable selection in linear models. The method consists of two major steps. In step 1, the lasso method is applied to many bootstrap samples, each using a set of randomly selected covariates. A measure of importance is yielded from this step for each covariate. In step 2, a similar procedure to the first step is implemented with the exception that for each bootstrap sample, a subset of covariates is randomly selected with unequal selection probabilities determined by the covariates' importance. Adaptive lasso may be used in the second step with weights determined by the importance measures. The final set of covariates and their coefficients are determined by averaging bootstrap results obtained from step 2. The proposed method alleviates some of the limitations of lasso, elastic-net and related methods noted especially in the context of microarray data analysis: it tends to remove highly correlated variables altogether or select them all, and maintains maximal flexibility in estimating their coefficients, particularly with different signs; the number of selected variables is no longer limited by the sample size; and the resulting prediction accuracy is competitive or superior compared to the alternatives. We illustrate the proposed method by extensive simulation studies. The proposed method is also applied to a Glioblastoma microarray data analysis.

Recently, considerable interest has focused on variable selection methods in regression situations where the number of predictors, $p$, is large relative to the number of observations, $n$. Two commonly applied variable selection approaches are the Lasso, which computes highly shrunk regression coefficients, and Forward Selection, which uses no shrinkage. We propose a new approach, "Forward-Lasso Adaptive SHrinkage" (FLASH), which includes the Lasso and Forward Selection as special cases, and can be used in both the linear regression and the Generalized Linear Model domains. As with the Lasso and Forward Selection, FLASH iteratively adds one variable to the model in a hierarchical fashion but, unlike these methods, at each step adjusts the level of shrinkage so as to optimize the selection of the next variable. We first present FLASH in the linear regression setting and show that it can be fitted using a variant of the computationally efficient LARS algorithm. Then, we extend FLASH to the GLM domain and demonstrate, through numerous simulations and real world data sets, as well as some theoretical analysis, that FLASH generally outperforms many competing approaches.

Image Credit: NASA/JPL/Space Science Institute
N00171130.jpg was taken on April 14, 2011 and received on Earth April 15, 2011. The camera was pointing toward DIONE at approximately 1,745,487 kilometers away, and the image was taken using the CL1 and CL2 filters