Emmanuel Candes and Yaniv Plan just released this preprint on Near-ideal model selection by L1 minimization. The abstract reads:
We consider the fundamental problem of estimating the mean of a vector y = X \beta + z where X is an n x p design matrix in which one can have far more variables than observations and z is a stochastic error term|the so-called `p higher than n' setup. When \beta is sparse, or more generally, when there is a sparse subset of covariates providing a close approximation to the unknown mean vector, we ask whether or not it is possible to accurately estimate X \beta using a computationally tractable algorithm. We show that in a surprisingly wide range of situations, the lasso happens to nearly select the best subset of variables. Quantitatively speaking, we prove that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error one would achieve with an oracle supplying perfect information about which variables should be included in the model and which variables should not. Interestingly, our results describe the average performance of the lasso; that is, the performance one can expect in an overwhelming majority of cases where X\beta is a sparse or nearly sparse superposition of variables, but not in all cases. Our results are nonasymptotic and widely applicable since they simply require that pairs of predictor variables be not overly collinear.
The main contribution of the paper is to show that the Lasso works under some conditions and bound on the sparsity level of the signal 'that is, for generic signals, the sparsity can grow almost linearly with the sample size. '
On a unrelated note and Joachim Dahl just released version 0.9.2 of CVXOPT: A Python Package for Convex Optimization. An interesting aspect of it is the linking possibilities to OpenOffice spreadsheets.
In the sparse coding world, these three papers got my attention: Shift-invariant sparse coding for audio classification by Roger Grosse, Rajat Raina, Helen Kwong, and Andrew Y. Ng, Self-taught learning: Transfer learning from unlabeled data by Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer and Andrew Y. Ng. and Efficient sparse coding algorithms by Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Y. Ng. I'll have to implement some of these in my Monday Morning Algorithm series at some point.
This past year saw some new search and rescue operations that used the knowledge of people through the internet. Somebody involved in the latest search for Steve Fowsett has decided to centralize this type of activity on one site: InternetSAR. An interview with the creator of the site can is here, thanks to a podcast by the folks at AVweb. I blogged several entries on Jim Gray's search and I think that it actually goes beyond having human watching and interpreting satellite photos as I have pointed out before, but this is a good and noteworthy effort in the right direction.
Credit photo: Armagh Observatory, Map of asteroids in the inner part of the Solar System.