Tuesday, October 31, 2017

Linear Dimensionality Reduction in Linear Time: Johnson-Lindenstrauss-type Guarantees for Random Subspace / Structure-aware error bounds for linear classification with the zero-one loss

Bob sent me the following:

Hi Igor,

I have a couple of recent reports posted to the arxiv, which may be of interest to you? One is a data-dependent Johnson-Lindenstrauss type result for feature subsampling, the other is some theory for linear classifiers and randomly-projected linear classifiers.
Links follow: JLL: https://arxiv.org/abs/1705.06408Learning theory: https://arxiv.org/abs/1709.09782

Hope you like them.


Thanks Bob  !

We consider the problem of efficient randomized dimensionality reduction with norm-preservation guarantees. Specifically we prove data-dependent Johnson-Lindenstrauss-type geometry preservation guarantees for Ho's random subspace method: When data satisfy a mild regularity condition -- the extent of which can be estimated by sampling from the data -- then random subspace approximately preserves the Euclidean geometry of the data with high probability. Our guarantees are of the same order as those for random projection, namely the required dimension for projection is logarithmic in the number of data points, but have a larger constant term in the bound which depends upon this regularity. A challenging situation is when the original data have a sparse representation, since this implies a very large projection dimension is required: We show how this situation can be improved for sparse binary data by applying an efficient `densifying' preprocessing, which neither changes the Euclidean geometry of the data nor requires an explicit matrix-matrix multiplication. We corroborate our theoretical findings with experiments on both dense and sparse high-dimensional datasets from several application domains.

We prove risk bounds for binary classification in high-dimensional settings when the sample size is allowed to be smaller than the dimensionality of the training set observations. In particular, we prove upper bounds for both 'compressive learning' by empirical risk minimization (ERM) (that is when the ERM classifier is learned from data that have been projected from high-dimensions onto a randomly selected low-dimensional subspace) as well as uniform upper bounds in the full high-dimensional space. A novel tool we employ in both settings is the 'flipping probability' of Durrant and Kaban (ICML 2013) which we use to capture benign geometric structures that make a classification problem 'easy' in the sense of demanding a relatively low sample size for guarantees of good generalization. Furthermore our bounds also enable us to explain or draw connections between several existing successful classification algorithms. Finally we show empirically that our bounds are informative enough in practice to serve as the objective function for learning a classifier (by using them to do so).

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: