Friday, November 13, 2015

False Discoveries Occur Early on the Lasso Path - implementation -

A phase transition for the LASSO


False Discoveries Occur Early on the Lasso Path by Weijie Su, Malgorzata Bogdan, Emmanuel Candes

In regression settings where explanatory variables have very low correlations and where there are relatively few e ects each of large magnitude, it is commonly believed that the Lasso shall be able to nd the important variables with few errors|if any. In contrast, this paper shows that this is not the case even when the design variables are stochastically independent. In a regime of linear sparsity, we demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the e ect sizes are. We derive a sharp asymptotic trade-o between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-o states that if we ever want to achieve a type II error (false negative rate) under a given threshold, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter. 

The matlab implementation to draw the Lasso Trade-off Diagram is here.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

1 comment:

Piggy said...

Could it be that this occurs because LASSO randomly selects one out of a group of correlated variables? When the number of independent variables is high, the probability to get groups of correlated variables increases. Hence the higher the chances of LASSO to generate false positives.

Printfriendly