Thursday, September 04, 2014

Thesis: Compressed sensingand dimensionality reduction for unsupervised learning - Anthony Bourrier

Here is another thesis of interest: 





Introduction

In parallel to this evolution in data processing, has been developed in the signal protional time, precision of the statistical analysis and data size has also been studied. In parallel to this evolution in data processing, has been developed in the signal profully applied for statistical analysis of voluminous data. The goal was principally to reduce the size of the considered data (similarly to the measurement step in compressed sensing), thus reducing the memory and/or computational time required for an algorithm to perform, while still being able to approximately extract the desired information (similarly to the reconstruction step). Therefore, compressed sensing seems to have a genuine potential for statistical analysis of voluminous data. The starting point of the Ph.D. thesis presented in this manuscript is therefore the study of the potential of compressed sensing for learning on data. More precisely, we will consider unsupervised learning, in which the considered data is not divided into categories before processing.
The document layout is the following: Chapter 1 provides a more thorough description of the notions and ideas at the source of this work. In particular, it draws links between compressed sensing and some techniques which have been proposed in the learning community. Then come three chapters, each one presenting a main contribution of the thesis.
These chapters all begin by a quick review of the state of the art relative to the corresponding contribution. Then come a summary of the contributions relative to the corresponding chapter. Finally, the contributions are described.
Chapter 2 contains the first contribution: a parameter estimation method for probability density mixtures on compressed data, in a similar way as what is done in compressed sensing, and which can be interpreted as a generalized compressed sensing problem. We propose to compress data to a fixed-size representation called a sketch, then derive an algorithm to induce mixture parameters from this sketch. We provide numerical experiments evaluating the experimental performance of this compressed framework in the case of isotropic Gaussian mixture estimation. Chapter 3 describes the second contribution : a generalization of results relative to the theoretical performance of reconstruction methods in compressed sensing, and more generally in inverse problems, to models beyond the models to which the previous results apply. More particularly, we provide necessary and sufficient conditions to the existence of an instance optimal reconstruction function, which is optimal in a certain sense with respect to the model. We study this instance optimality property in a general case and link it with a generalized Restricted Isometry Property, which is a widely-used property in compressed sensing.
Finally, Chapter 4 presents the third main contribution: an approximate nearest neighbor search of a certain element with respect to certain distances, relying on the approximation of these distances by a Euclidean distance. We provide a simple framework combining two existing dimension-reducing techniques: explicit embedding and Product Quantization. We provide experiments showing that these frameworks allow a better precision than hashing techniques designed to work with kernels.
Résumé : Cette thèse est motivée par la perspective de rapprochement entre traitement du signal et apprentissage statistique, et plus particulièrement par l'exploitation de techniques d'échantillonnage compressé afin de réduire le coût de tâches d'apprentissage. Après avoir rappelé les bases de l'échantillonnage compressé, nous proposons un cadre de travail pour l'estimation de paramètres de mélange de densités de probabilité dans lequel les données d'entraînement sont compressées en une représentation de taille fixe. Nous instancions ce cadre sur un modèle de mélange de Gaussiennes isotropes. Cette preuve de concept suggère l'existence de garanties théoriques de reconstruction d'un signal pour des modèles allant au-delà du modèle parcimonieux usuel de vecteurs. Nous étudions ainsi dans un second temps la généralisation de résultats de stabilité de problèmes inverses linéaires à des modèles tout à fait généraux de signaux. Nous proposons des conditions sous lesquelles des garanties de reconstruction peuvent être données dans un cadre général. Enfin, nous nous penchons sur un problème de recherche approchée de plus proche voisin avec calcul de signature des vecteurs afin de réduire la complexité. Dans le cadre où la distance d'intérêt est associée à un noyau de Mercer, nous proposons de combiner un plongement explicite des données suivi d'un calcul de signatures, ce qui aboutit notamment à une recherche approchée plus précise.
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments:

Printfriendly