The following three papers talk about random features, the first one for sketching purposes, the second as a basis to develop a faster approximation and the third one to optimize a real world setup.

Sketching for Large-Scale Learning of Mixture Models by Nicolas Keriven, Anthony Bourrier, Rémi Gribonval, Patrick Perez

Abstract : Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning'' framework where we first sketch the data by computing random generalized moments of the underlying probability distribution, then estimate mixture model parameters from the sketch using an iterative algorithm analogous to greedy sparse signal recovery. We exemplify our framework with the sketched estimation of Gaussian Mixture Models (GMMs). We experimentally show that our approach yields results comparable to the classical Expectation-Maximization (EM) technique while requiring significantly less memory and fewer computations when the number of database elements is large. We report large-scale experiments in speaker verification, where our approach makes it possible to fully exploit a corpus of 1000 hours of speech signal to learn a universal background model at scales computationally inaccessible to EM.

Structured Transforms for Small-Footprint Deep Learning

Vikas Sindhwani, Tara N. Sainath, Sanjiv Kumar

Vikas Sindhwani, Tara N. Sainath, Sanjiv Kumar

We consider the task of building compact deep learning pipelines suitable for deployment on storage and power constrained mobile devices. We propose a unified framework to learn a broad family of structured parameter matrices that are characterized by the notion of low displacement rank. Our structured transforms admit fast function and gradient evaluation, and span a rich range of parameter sharing configurations whose statistical modeling capacity can be explicitly tuned along a continuum from structured to unstructured. Experimental results show that these transforms can significantly accelerate inference and forward/backward passes during training, and offer superior accuracy-compactness-speed tradeoffs in comparison to a number of existing techniques. In keyword spotting applications in mobile speech recognition, our methods are much more effective than standard linear low-rank bottleneck layers and nearly retain the performance of state of the art models, while providing more than 3.5-fold compression.Data-driven Minimization with Random Feature Expansions for Optical Beam Forming Network Tuning by Laurens Bliek, Michel Verhaegen and Sander Wahls

This paper proposes a data-driven method to minimize objective functions which can be measured in practice but are di cult to model. In the proposed method, the objective is learned directly from training data using random feature expansions. On the theoretical side, it is shown that the learned objective does not suff er from arti cial local minima far away from the minima of the true objective if the random basis expansions are fit well enough in the uniform sense. The method is also tested on a real-life application, the tuning of an optical beamforming network. It is found that, in the presence of small model errors, the proposed method outperforms the classical approach of modeling from rst principles and then estimating the model parameters.

**Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !**

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## No comments:

Post a Comment