As we were discussing the release of an implementation of the Sparse PCA algorithm, Andrea mentioned to me the release of an R program for hypothesis testing with the Lasso:
Conﬁdence Intervals and Hypothesis Testing for High-Dimensional Regression by Adel Javanmard, Andrea Montanari
Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical signiﬁcance as conﬁdence intervals or p-values for these models. We consider here high-dimensional linear regression problem, and propose an eﬃcient algorithm for constructing conﬁdence intervals and p-values. The resulting conﬁdence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized M-estimators. The new construction improves over recent work in the ﬁeld in that it does not assume a special structure on the design matrix. We test our method on synthetic data and a high-throughput genomic data set about riboﬂavin production rate, made publicly available by [BKM14].
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.