Nuit Blanche: Random Projections as Regularizers, Compressive Linear Least Squares Regression and more

Friday, June 26, 2015

Random Projections as Regularizers, Compressive Linear Least Squares Regression and more

We've seen a few papers recently

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection
Towards Large Scale Continuous EDA: A Random Matrix Theory Perspective - implementation -
The Unreasonable Effectiveness of Random Projections in Computer Science

here are a few more from the same authors

Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions by Robert Durrant, Ata Kabán

We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher Linear Discriminant classifiers, focusing on the case when there are fewer training observations than data dimensions. The speci c form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classi er and classi er ensemble pair. Furthermore we show that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents over tting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our ensemble is learned from a set of randomly projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical ndings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm.

New Bounds on Compressive Linear Least Squares Regression by Ata Kabán

In this paper we provide a new analysis of compressive least squares regression that removes a spurious logN factor from previous bounds, where N is the number of training points. Our new bound has a clear interpretation and reveals meaningful structural properties of the linear regression problem that makes it solvable effectively in a small dimensional random subspace. In addition, the main part of our analysis does not require the compressive matrix to have the Johnson-Lindenstrauss property, or the RIP property. Instead, we only require its entries to be drawn i.i.d. from a 0-mean symmetric distribution with finite first four moments.

Heavy tails with Parameter Adaptation in Random Projection based continuous EDA by Momodou L. Sanyang and Ata Kabán

In this paper, we present a new variant of EDA for high dimensional continuous optimisation, which extends a recently proposed random projections (RP) ensemble based approach by employing heavy tailed random matrices. In particular, we use random matrices with i.i.d. t-distributed entries. The use of t-distributions may look surprising in the context of random projections, however we show that the resulting ensemble covariance is enlarged when the degree of freedom parameter is lowered. Based on this observation, we develop an adaptive scheme to adjust this parameter during evolution, and this results in a flexible means of balancing exploration and exploitation of the search process. A comprehensive set of experiments on high dimensional benchmark functions demonstrate the usefulness of our approach.

Multivariate Cauchy EDA Optimisation. by Momodou L. Sanyang, Ata Kabán

We consider Black-Box continuous optimization by Estimation of Distribution Algorithms (EDA). In continuous EDA, the multivariate Gaussian distribution is widely used as a search operator, and it has the well-known advantage of modelling the correlation structure of the search variables, which univariate EDA lacks. However, the Gaussian distribution as a search operator is prone to premature convergence when the population is far from the optimum. Recent work suggests that replacing the univariate Gaussian with a univariate Cauchy distribution in EDA holds promise in alleviating this problem because it is able to make larger jumps in the search space due to the Cauchy distribution's heavy tails. In this paper, we propose the use of a multivariate Cauchy distribution to blend together the advantages of multivariate modelling with the ability of escaping early convergence to efficiently explore the search space. Experiments on 16 benchmark functions demonstrate the superiority of multivariate Cauchy EDA against univariate Cauchy EDA, and its advantages against multivariate Gaussian EDA when the population lies far from the optimum

Two approaches of Using Heavy Tails in High dimensional EDA by Momodou L. Sanyang, Hanno Muehlbrandt, Ata Kabán

We consider the problem of high dimensional black-box optimisation via Estimation of Distribution Algorithms (EDA). The Gaussian distribution is commonly used as a search operator in most of the EDA methods. However there are indications in the literature that heavy tailed distributions may perform better due to their higher exploration capabilities. Univariate heavy tailed distributions were already proposed for high dimensional problems. In 2D problems it has been reported that a multivariate heavy tailed (such as Cauchy) search distribution is able to blend together the strengths of multivariate modelling with a high exploration power. In this paper, we study whether a similar scheme would work well in high dimensional search problems. To get around of the difficulty of multivariate model building in high dimensions we employ a recently proposed random projections (RP) ensemble based approach which we modify to get samples from a multivariate Cauchy using the scale-mixture representation of the Cauchy distribution. Our experiments show that the resulting RP-based multivariate Cauchy EDA consistently improves on the performance of the univariate Cauchy search distribution. However, intriguingly, the RP-based multivariate Gaussian EDA has the best performance among these methods. It appears that the highly explorative nature of the multivariate Cauchy sampling is exacerbated in high dimensional search spaces and the population based search loses its focus and effectiveness as a result. Finally, we present an idea to increase exploration while maintaining exploitation and focus by using the RP-based multivariate Gaussian EDA in which the RP matrices are drawn with i.i.d. heavy tailed entries. This achieves improved performance and is competitive with the state of the art.

How effective is Cauchy-EDA in high dimensions? by Momodou L. Sanyang and Ata Kabán

We consider the problem of high dimensional black-box optimisation via Estimation of Distribution Algorithms (EDA). The Gaussian distribution is commonly used as a search operator in most of the EDA methods. However there are indications in the literature that heavy tailed distributions may perform better due to their higher exploration capabilities. Univariate heavy tailed distributions were already proposed for high dimensional problems. In 2D problems it has been reported that a multivariate heavy tailed (such as Cauchy) search distribution is able to blend together the strengths of multivariate modelling with a high exploration power. In this paper, we study whether a similar scheme would work well in high dimensional search problems. To get around of the difficulty of multivariate model building in high dimensions we employ a recently proposed random projections (RP) ensemble based approach which we modify to get samples from a multivariate Cauchy using the scale-mixture representation of the Cauchy distribution. Our experiments show that the resulting RP-based multivariate Cauchy EDA consistently improves on the performance of the univariate Cauchy search distribution. However, intriguingly, the RP-based multivariate Gaussian EDA has the best performance among these methods. It appears that the highly explorative nature of the multivariate Cauchy sampling is exacerbated in high dimensional search spaces and the population based search loses its focus and e ectiveness as a result. Finally, we present an idea to increase exploration while maintaining exploitation and focus by using the RP-based multivariate Gaussian EDA in which the RP matrices are drawn with i.i.d. heavy tailed entries. This achieves improved performance and is competitive with the state of the art.

Join the CompressiveSensing subreddit or the Google+ Community and post there !