Nuit Blanche: 06/01/2015

Tuesday, June 30, 2015

Vowpal Wabbit: A Machine Learning System ( Paris Machine Learning Meetup "Hors Série" #4 Season 2 )

Following up on meetup #6 Season 2, John Langford just gave us a tutorial presentation on Vowpal Wabbit this morning in Paris (Thank you Christophe and AXA for hosting us). Here are the slides:

Vowpal Wabbit: A Machine Learning System

Here is some additional relevant background material.

The wiki: https://github.com/JohnLangford/vowpal_wabbit/wiki
The mailing list: https://groups.yahoo.com/neo/groups/vowpal_wabbit/info
Learning to search framework: http://arxiv.org/abs/1406.1837
Learning to search algorithm: http://arxiv.org/abs/1502.02206
Learning reductions survey: http://arxiv.org/abs/1502.02704
Learning to interact tutorial: http://hunch.net/~jl/interact.pdf
Most recent exciting paper: http://arxiv.org/abs/1402.0555
Feature Hashing: http://arxiv.org/abs/0902.2206
Terascale learning: http://arxiv.org/abs/1110.4198
Many of the lectures in the large scale learning class with Yann @NYU: http://cilvr.cs.nyu.edu/doku.php?id=courses:bigdata:slides:start
Hash Kernels, http://jmlr.org/proceedings/papers/v5/shi09a/shi09a.pdf

Much like the example Leon Bottou gave us on counterfactual reasoning, ( see his slides: Learning to Interact ) a year ago. I very much liked the exploration bit for policies evaluation: if you don't explore you just don't know and prediction errors are not controlled exploration.

which will be the subject of John's presentation at ICML in Lille next week:

Learning to Search Better than Your Teacher by Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daume, John Langford

[supplementary material]

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, June 29, 2015

Slides and Implementation: Random forests with random projections of the output space for high dimensional multi-label classification

We mentioned this before a year ago, here are the slides: Random forests with random projections of the output space for high dimensional multi-label classification by Arnaud Joly, Pierre Geurts, Louis Wehenkel.

The implementation of the random output tree is here at: https://github.com/arjoly/random-output-trees

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Thesis: Learning in High Dimensions with Projected Linear Discriminants by Robert Durrant

Learning in High Dimensions with Projected Linear Discriminants by Robert Durrant

The enormous power of modern computers has made possible the statistical modelling of data with dimensionality that would have made this task inconceivable only decades ago. However, experience in such modelling has made researchers aware of many issues associated with working in high-dimensional domains, collectively known as `the curse of dimensionality', which can confound practitioners' desires to build good models of the world from these data. When the dimensionality is very large, low-dimensional methods and geometric intuition both break down in these high-dimensional spaces. To mitigate the dimensionality curse we can use low-dimensional representations of the original data that capture most of the information it contained. However, little is currently known about the eff ect of such dimensionality reduction on classi fier performance. In this thesis we develop theory quantifying the e ect of random projection { a recent, very promising, non-adaptive dimensionality reduction technique {on the classi cation performance of Fisher's Linear Discriminant (FLD), a successful and widely-used linear classifier. We tackle the issues associated with small sample size and high-dimensionality by using randomly projected FLD ensembles, and we develop theory explaining why our new approach performs well. Finally, we quantify the generalization error of Kernel FLD, a related non-linear projected classifier.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Sunday, June 28, 2015

Slides: Learning with Random Projections, Random Projections for Machine Learning and Data Mining: Theory & Applications

Following up on Friday's The Unreasonable Effectiveness of Random Projections in Computer Science here are two series of slides:

Learning with Random Projections by Ata Kaban. The outline of the slides features the following subject;

Introduction to compressive learning
Compressive classification
Compressive regression
Ensembles

And earlier, a tutorial for Random Projections for Machine Learning and Data Mining: Theory & Applications from Bob Durrant and Ata Kaban. (Handouts (6 Slides per page) ). This is from the website for the ECML PKDD 2012 tutorial.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

CfP: The 3rd International Workshop on High Dimensional Data Mining (HDM’15)

Here is an interesting workshop organized by Ata Kaban

The 3^rd International Workshop on High Dimensional Data Mining (HDM’15)
In conjunction with the IEEE International Conference on Data Mining (IEEE ICDM 2015)

14 November 2015, Atlantic City, NJ, USA.

Description of Workshop

Over a decade ago, Stanford statistician David Donoho predicted that the 21st century will be the century of data. "We can say with complete confidence that in the coming century, high-dimensional data analysis will be a very significant activity, and completely new methods of high-dimensional data analysis will be developed; we just don't know what they are yet." -- D. Donoho, 2000.

Unprecedented technological advances lead to increasingly high dimensional data sets in all areas of science, engineering and businesses. These include genomics and proteomics, biomedical imaging, signal processing, astrophysics, finance, web and market basket analysis, among many others. The number of features in such data is often of the order of thousands or millions - that is much larger than the available sample size.

For a number of reasons, classical data analysis methods inadequate, questionable, or inefficient at best when faced with high dimensional data spaces:

1. High dimensional geometry defeats our intuition rooted in low dimensional experiences, and this makes data presentation and visualisation particularly challenging.

2. Phenomena that occur in high dimensional probability spaces, such as the concentration of measure, are counter-intuitive for the data mining practitioner. For instance, distance concentration is the phenomenon that the contrast between pair-wise distances may vanish as the dimensionality increases.

3. Bogus correlations and misleading estimates may result when trying to fit complex models for which the effective dimensionality is too large compared to the number of data points available.

4. The accumulation of noise may confound our ability to find low dimensional intrinsic structure hidden in the high dimensional data.

5. The computation cost of processing high dimensional data or carrying out optimisation over a high dimensional parameter spaces is often prohibiting.

Topics

This workshop aims to promote new advances and research directions to address the curses and uncover and exploit the blessings of high dimensionality in data mining. Topics of interest include all aspects of high dimensional data mining, including the following:

- Systematic studies of how the curse of dimensionality affects data mining methods

- Models of low intrinsic dimension: sparse representation, manifold models, latent structure models, large margin, other?

- How to exploit intrinsic dimension in optimisation tasks for data mining?

- New data mining techniques that scale with the intrinsic dimension, or exploit some properties of high dimensional data spaces

- Dimensionality reduction

- Methods of random projections, compressed sensing, and random matrix theory applied to high dimensional data mining and high dimensional optimisation

- Theoretical underpinning of mining data whose dimensionality is larger than the sample size

- Classification, regression, clustering, visualisation of high dimensional complex data sets

- Functional data mining

- Data presentation and visualisation methods for very high dimensional data sets

- Data mining applications to real problems in science, engineering or businesses where the data is high dimensional

Paper submission

High quality original submissions are solicited for oral and poster presentation at the workshop. Papers should not exceed a maximum of 8 pages, and must follow the IEEE ICDM format requirements of the main conference. All submissions will be peer-reviewed, and all accepted workshop papers will be published in the proceedings by the IEEE Computer Society Press. Submit your paper here.

Important dates

Submission deadline: July 20, 2015.

Notifications to authors: September 1, 2015.

Workshop date: November 13, 2015.

Program committee

Arthur Zimek – Ludwig-Maximilians-Universitaet, Muenchen, Germany

Ata Kaban – University of Birmingham, UK

Barbara Hammer – University of Bielefeld, Germany

Bob Durrant – Waikato University, NZ

John A. Lee – Universite Catholique de Louvain, Belgium

Mark Last – Ben-Gurion University of the Negev, Israel

Mehmed Kantardzic – University of Louisville, USA

Michael E. Houle – National Institute of Informatics, Japan
Milos Radovanovic – University of Novi Sad, Serbia

Nenad Tomasev – Google, Mountain View, CA, USA

Peter Tino – University of Birmingham, UK

Stephan Gunnemann – Carnegie Mellon University, USA

Udo Seiffert – Fraunhofer IFF Magdeburg & University of Magdeburg, Germany

Yiming Ying – SUNY, NY, USA

Workshop organisation & contact

Dr. Ata Kaban
School of Computer Science, University of Birmingham, UK

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Friday, June 26, 2015

Random Projections as Regularizers, Compressive Linear Least Squares Regression and more

We've seen a few papers recently

here are a few more from the same authors

Random Projections as Regularizers: Learning a Linear Discriminant from Fewer Observations than Dimensions by Robert Durrant, Ata Kabán

We prove theoretical guarantees for an averaging-ensemble of randomly projected Fisher Linear Discriminant classifiers, focusing on the case when there are fewer training observations than data dimensions. The speci c form and simplicity of this ensemble permits a direct and much more detailed analysis than existing generic tools in previous works. In particular, we are able to derive the exact form of the generalization error of our ensemble, conditional on the training set, and based on this we give theoretical guarantees which directly link the performance of the ensemble to that of the corresponding linear discriminant learned in the full data space. To the best of our knowledge these are the first theoretical results to prove such an explicit link for any classi er and classi er ensemble pair. Furthermore we show that the randomly projected ensemble is equivalent to implementing a sophisticated regularization scheme to the linear discriminant learned in the original data space and this prevents over tting in conditions of small sample size where pseudo-inverse FLD learned in the data space is provably poor. Our ensemble is learned from a set of randomly projected representations of the original high dimensional data and therefore for this approach data can be collected, stored and processed in such a compressed form. We confirm our theoretical ndings with experiments, and demonstrate the utility of our approach on several datasets from the bioinformatics domain and one very high dimensional dataset from the drug discovery domain, both settings in which fewer observations than dimensions are the norm.

New Bounds on Compressive Linear Least Squares Regression by Ata Kabán

In this paper we provide a new analysis of compressive least squares regression that removes a spurious logN factor from previous bounds, where N is the number of training points. Our new bound has a clear interpretation and reveals meaningful structural properties of the linear regression problem that makes it solvable effectively in a small dimensional random subspace. In addition, the main part of our analysis does not require the compressive matrix to have the Johnson-Lindenstrauss property, or the RIP property. Instead, we only require its entries to be drawn i.i.d. from a 0-mean symmetric distribution with finite first four moments.

Heavy tails with Parameter Adaptation in Random Projection based continuous EDA by Momodou L. Sanyang and Ata Kabán

In this paper, we present a new variant of EDA for high dimensional continuous optimisation, which extends a recently proposed random projections (RP) ensemble based approach by employing heavy tailed random matrices. In particular, we use random matrices with i.i.d. t-distributed entries. The use of t-distributions may look surprising in the context of random projections, however we show that the resulting ensemble covariance is enlarged when the degree of freedom parameter is lowered. Based on this observation, we develop an adaptive scheme to adjust this parameter during evolution, and this results in a flexible means of balancing exploration and exploitation of the search process. A comprehensive set of experiments on high dimensional benchmark functions demonstrate the usefulness of our approach.

Multivariate Cauchy EDA Optimisation. by Momodou L. Sanyang, Ata Kabán

We consider Black-Box continuous optimization by Estimation of Distribution Algorithms (EDA). In continuous EDA, the multivariate Gaussian distribution is widely used as a search operator, and it has the well-known advantage of modelling the correlation structure of the search variables, which univariate EDA lacks. However, the Gaussian distribution as a search operator is prone to premature convergence when the population is far from the optimum. Recent work suggests that replacing the univariate Gaussian with a univariate Cauchy distribution in EDA holds promise in alleviating this problem because it is able to make larger jumps in the search space due to the Cauchy distribution's heavy tails. In this paper, we propose the use of a multivariate Cauchy distribution to blend together the advantages of multivariate modelling with the ability of escaping early convergence to efficiently explore the search space. Experiments on 16 benchmark functions demonstrate the superiority of multivariate Cauchy EDA against univariate Cauchy EDA, and its advantages against multivariate Gaussian EDA when the population lies far from the optimum

Two approaches of Using Heavy Tails in High dimensional EDA by Momodou L. Sanyang, Hanno Muehlbrandt, Ata Kabán

We consider the problem of high dimensional black-box optimisation via Estimation of Distribution Algorithms (EDA). The Gaussian distribution is commonly used as a search operator in most of the EDA methods. However there are indications in the literature that heavy tailed distributions may perform better due to their higher exploration capabilities. Univariate heavy tailed distributions were already proposed for high dimensional problems. In 2D problems it has been reported that a multivariate heavy tailed (such as Cauchy) search distribution is able to blend together the strengths of multivariate modelling with a high exploration power. In this paper, we study whether a similar scheme would work well in high dimensional search problems. To get around of the difficulty of multivariate model building in high dimensions we employ a recently proposed random projections (RP) ensemble based approach which we modify to get samples from a multivariate Cauchy using the scale-mixture representation of the Cauchy distribution. Our experiments show that the resulting RP-based multivariate Cauchy EDA consistently improves on the performance of the univariate Cauchy search distribution. However, intriguingly, the RP-based multivariate Gaussian EDA has the best performance among these methods. It appears that the highly explorative nature of the multivariate Cauchy sampling is exacerbated in high dimensional search spaces and the population based search loses its focus and effectiveness as a result. Finally, we present an idea to increase exploration while maintaining exploitation and focus by using the RP-based multivariate Gaussian EDA in which the RP matrices are drawn with i.i.d. heavy tailed entries. This achieves improved performance and is competitive with the state of the art.

How effective is Cauchy-EDA in high dimensions? by Momodou L. Sanyang and Ata Kabán

We consider the problem of high dimensional black-box optimisation via Estimation of Distribution Algorithms (EDA). The Gaussian distribution is commonly used as a search operator in most of the EDA methods. However there are indications in the literature that heavy tailed distributions may perform better due to their higher exploration capabilities. Univariate heavy tailed distributions were already proposed for high dimensional problems. In 2D problems it has been reported that a multivariate heavy tailed (such as Cauchy) search distribution is able to blend together the strengths of multivariate modelling with a high exploration power. In this paper, we study whether a similar scheme would work well in high dimensional search problems. To get around of the difficulty of multivariate model building in high dimensions we employ a recently proposed random projections (RP) ensemble based approach which we modify to get samples from a multivariate Cauchy using the scale-mixture representation of the Cauchy distribution. Our experiments show that the resulting RP-based multivariate Cauchy EDA consistently improves on the performance of the univariate Cauchy search distribution. However, intriguingly, the RP-based multivariate Gaussian EDA has the best performance among these methods. It appears that the highly explorative nature of the multivariate Cauchy sampling is exacerbated in high dimensional search spaces and the population based search loses its focus and e ectiveness as a result. Finally, we present an idea to increase exploration while maintaining exploitation and focus by using the RP-based multivariate Gaussian EDA in which the RP matrices are drawn with i.i.d. heavy tailed entries. This achieves improved performance and is competitive with the state of the art.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

The Unreasonable Effectiveness of Random Projections in Computer Science

In the series "The Unreasonable effectiveness of", we've had Deep Learning, Random Forests, Recurrent Neural Networks, Consensus Labeling. Today (though it was presented in December of last year) we have: The Unreasonable Effectiveness of Random Projections in Computer Science by Bob Durrant

PS: As Gabriel pointed out, the whole meme started with Wigner who also happened to have played a big part in Random Matrix Theory.

Slide from Roman Vershynin in Random matrix theory from the functional analytic perspective

References:

The Unreasonable Effectiveness of Deep Learning by Yann LeCun
The Unreasonable Effectiveness of Random Forests by Ahmed El Deeb
The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy
The Unreasonable Effectiveness of Consensus Labeling by Robert Grossman
The Unreasonable Effectiveness of Data by Alon Halevy, Peter Norvig, and Fernando Pereira.
The Unreasonable Effectiveness of Mathematics in the Natural Sciences by Eugene Wigner

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Towards Large Scale Continuous EDA: A Random Matrix Theory Perspective - implementation -

How about using random projections to explore the world with evolutionary algorithms ? Here you go with Ensemble Random Projections ( a technique that seems very close to some Random Forest approach)

Towards Large Scale Continuous EDA: A Random Matrix Theory Perspective by Ata Kabán, Jakramate Bootkrajang, Robert Durrant

Estimation of distribution algorithms (EDA) are a major branch of evolutionary algorithms (EA) with some unique advantages in principle. They are able to take advantage of correlation structure to drive the search more efficiently, and they are able to provide insights about the structure of the search space. However, model building in high dimensions is extremely challenging and as a result existing EDAs may become less attractive in large scale problems due to the associated large computational requirements. Large scale continuous global optimisation is key to many modern-day real-world problems. Scaling up EAs to large scale problems has become one of the biggest challenges of the field. This paper pins down some fundamental roots of the problem and makes a start at developing a new and generic framework to yield effective and efficient EDA-type algorithms for large scale continuous global optimisation problems. Our concept is to introduce an ensemble of random projections to low dimensions of the set of fittest search points as a basis for developing a new and generic divide-and-conquer methodology. Our ideas are rooted in the theory of random projections developed in theoretical computer science, and in developing and analysing our framework we exploit some recent results in non-asymptotic random matrix theory. MATLAB code is available from http://www.cs.bham.ac.uk/~axk/rpm.zip

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Thursday, June 25, 2015

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Random projections or sign random projections are useful beyond distance preservation. We've known this for a little a while when looking at how manifolds kept some structure under random projection (see Rich Baraniuk's presentation, or Mike Wakin's thesis, or here), or how the angles between submanifolds provided some justification for reconstruction solvers (also here), but today's paper wants to look into this issue deeper as regards to the standpoint of machine learning.

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection by Ata Kabán

Dot product is a key building block in a number of data mining algorithms from classification, regression, correlation clustering, to information retrieval and many others. When data is high dimensional, the use of random projections may serve as a universal dimensionality reduction method that provides both low distortion guarantees and computational savings. Yet, contrary to the optimal guarantees that are known on the preservation of the Euclidean distance cf. the Johnson-Lindenstrauss lemma, the existing guarantees on the dot product under random projection are loose and incomplete in the current data mining and machine learning literature. Some recent literature even suggested that the dot product may not be preserved when the angle between the original vectors is obtuse. In this paper we provide improved bounds on the dot product under random projection that matches the optimal bounds on the Euclidean distance. As a corollary, we elucidate the impact of the angle between the original vectors on the relative distortion of the dot product under random projection, and we show that the obtuse vs. acute angles behave symmetrically in the same way. In a further corollary we make a link to sign random projection, where we generalise earlier results. Numerical simulations confirm our theoretical results. Finally we give an application of our results to bounding the generalisation error of compressive linear classifiers under the margin loss.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

The Small Victories

As any long distance runner/blogger will tell you, It's the small victories that matter.

Twenty years ago, I got a paper published that was at the crossroads between extreme fluid dynamics (two phase flow in a turbulent regime in zero gravity) and something that could be viewed today as a Machine Learning classification task. Because it was way far from traditional approaches in that area of engineering, it had problems with the peer review (I think it got eventually accepted because the reviewer actually died and there were too few specialists in this area to say something about what we had).

It so happens that our recent paper in Scientific Reports has just garnered more citations that than paper.

Fascinating enough, the recent citations of that 1994 paper are indicative of the move by the Chinese to do innovative things in space. Woohoo, the more, the better !

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Physics-driven inverse problems made tractable with cosparse regularization

After attending the Teratec Forum and watching how some contractors were beginning to use HPC as a way of testing full scale designs of their products - a proxy of that use shows through the exponential use of Teraflops for their internal computations in the figure below -, I wonder if some of these computations could use some of the new techniques developed below.

Teraflops increase #teratecforum pic.twitter.com/fStcjSSzS1
— Igor Carron (@IgorCarron) June 24, 2015

You may recall this Sunday Morning Insight entry on Physics Driven Sensor Design ? well it looks like some of the authors are making progress toward using the constraint of sparsity in field equations and they find that the analysis approach does improve the solver in terms of measurements, memory size and runtime: what's not to like ?

Physics-driven inverse problems made tractable with cosparse regularization by Srđan Kitić, Laurent Albera, Nancy Bertin, Rémi Gribonval

Sparse data models are powerful tools for solving ill-posed inverse problems. We present a regularization framework based on the sparse synthesis and sparse analysis models for problems governed by linear partial differential equations. Although nominally equivalent, we show that the two models differ substantially from a computational perspective: unlike the sparse synthesis model, its analysis counterpart has much better scaling capabilities and can indeed be faster when more measurement data is available. Our findings are illustrated on two examples, sound source localization and brain source localization, which also serve as showcases for the regularization framework. To address this type of inverse problems, we develop a specially tailored convex optimization algorithm based on the Alternating Direction Method of Multipliers.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Wednesday, June 24, 2015

Raking the Cocktail Party - implementation -

Following up on this morning's thesis here is an implementation:

Raking the Cocktail Party by Dokmanic, Ivan; Scheibler, Robin; Vetterli, Martin

We present the concept of an acoustic rake receiver—a microphone beamformer that uses echoes to improve the noise and interference suppression. The rake idea is well-known in wireless communications; it involves constructively combining different multipath components that arrive at the receiver antennas. Unlike spread-spectrum signals used in wireless communications, speech signals are not orthogonal to their shifts. Therefore, we focus on the spatial structure, rather than temporal. Instead of explicitly estimating the channel, we create correspondences between early echoes in time and image sources in space. These multiple sources of the desired and the interfering signal offer additional spatial diversity that we can exploit in the beamformer design. We present several “intuitive” and optimal formulations of acoustic rake receivers, and show theoretically and numerically that the rake formulation of the maximum signal-to-interference-and-noise beamformer offers significant performance boosts in terms of noise and interference suppression. Beyond signal-to-noise ratio, we observe gains in terms of the perceptual evaluation of speech quality (PESQ) metric for the speech quality. We accompany the paper by the complete simulation and processing chain written in Python. The code and the sound samples are available online at http://lcav.github.io/AcousticRakeReceiver/.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Thesis: Listening to Distances and Hearing Shapes: Inverse Problems in Room Acoustics and Beyond

Here is another illuminating thesis work !

Listening to Distances and Hearing Shapes: Inverse Problems in Room Acoustics and Beyond by Ivan Dokmanic

A central theme of this thesis is using echoes to achieve useful, interesting, and sometimes surprising results. One should have no doubts about the echoes' constructive potential; it is, after all, demonstrated masterfully by Nature. Just think about the bat's intriguing ability to navigate in unknown spaces and hunt for insects by listening to echoes of its calls, or about similar (albeit less well-known) abilities of toothed whales, some birds, shrews, and ultimately people. We show that, perhaps contrary to conventional wisdom, multipath propagation resulting from echoes is our friend. When we think about it the right way, it reveals essential geometric information about the sources{channel{receivers system. The key idea is to think of echoes as being more than just delayed and attenuated peaks in 1D impulse responses; they are actually additional sources with their corresponding 3D locations. This transformation allows us to forget about the abstract
room, and to replace it by more familiar point sets. We can then engage the powerful machinery of Euclidean distance geometry. A problem that always arises is that we do not know a priori the matching between the peaks and the points in space, and solving the inverse problem is achieved by echo sorting a tool we developed for learning correct labelings of echoes. This has applications beyond acoustics, whenever one deals with waves and reflections, or more generally, time-of- flight measurements. Equipped with this perspective, we rst address the \Can one hear the shape of a room?" question, and we answer it with a qualified \yes". Even a single impulse response uniquely describes a convex polyhedral room, whereas a more practical algorithm to reconstruct the room's geometry uses only fi rst-order echoes and a few microphones. Next, we show how di erent problems of localization bene t from echoes. The first one is multiple indoor sound source localization. Assuming the room is known, we show that discretizing the Helmholtz equation yields a system of sparse reconstruction problems linked by the common sparsity pattern. By exploiting the full bandwidth of the sources, we show that it is possible to localize multiple unknown sound sources using only a single microphone. We then look at indoor localization with known pulses from the geometric echo perspective introduced previously. Echo sorting enables localization in non-convex rooms without a line-of-sight path, and localization with a single omni-directional sensor, which is impossible without echoes. A closely related problem is microphone position calibration; we show that echoes can help even without assuming that the room is known. Using echoes, we can localize arbitrary numbers of microphones at unknown locations in an unknown room using only one source at an unknown
location|for example a fi nger snap|and get the room's geometry as a byproduct. Our study of source localization outgrew the initial form factor when we looked at source localization with spherical microphone arrays. Spherical signals appear well beyond spherical microphone arrays; for example, any signal de ned on Earth's surface lives on a sphere. This resulted in the rst slight departure from the main theme: We develop the theory and algorithms for sampling sparse signals on the sphere using nite rate-of-innovation principles and apply it to various signal processing problems on the sphere. One way our brain uses echoes to improve speech communication is by integrating them with the direct path to increase the e ective useful signal power. We take inspiration from this human ability, and propose acoustic rake receivers (ARRs) for speech|microphone beamformers that listen to echoes. We show that by beamforming towards echoes, ARRs improve not only the signal-to-interference-and-noise ratio (SINR), but also the perceptual evaluation of speech quality (PESQ). The fi nal chapter is motivated by yet another localization problem, this time a tomographic inversion that must be performed extremely fast on computation- and storage-constrained hard-
ware. We initially proposed the sparse pseudoinverse as a solution, and this led us to the second slight departure from the main theme: an investigation of the properties of various norm-minimizing generalized inverses. We categorize matrix norms according to whether or not their minimization yields the MPP, and show that norm-minimizing generalized inverses have interesting properties. For example, the worst-case and average-case`lp-norm blowup is minimized by generalized inverses minimizing certain induced and mixed matrix norms; we call this a poor man's l`p minimization.

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Tuesday, June 23, 2015

Simultaneous Orthogonal Matching Pursuit With Noise Stabilization: Theoretical Analysis - implementation -

Here is another solver for MMV type of provlems like the ones found in congnitive radio.

Simultaneous Orthogonal Matching Pursuit With Noise Stabilization: Theoretical Analysis
J. F. Determe, J. Louveaux, L. Jacques, F. Horlin

This paper studies the joint support recovery of similar sparse vectors on the basis of a limited number of noisy linear measurements, i.e., in a multiple measurement vector (MMV) model. The additive noise signals on each measurement vector are assumed to be Gaussian and to exhibit different variances. The simultaneous orthogonal matching pursuit (SOMP) algorithm is generalized to weight the impact of each measurement vector on the choice of the atoms to be picked according to their noise levels. The new algorithm is referred to as SOMP-NS where NS stands for noise stabilization. To begin with, a theoretical framework to analyze the performance of the proposed algorithm is developed. This framework is then used to build conservative lower bounds on the probability of partial or full joint support recovery. Numerical simulations show that the proposed algorithm outperforms SOMP and that the theoretical lower bound provides a great insight into how SOMP-NS behaves when the weighting strategy is modified.

The implementation is here: https://opera-wireless.ulb.ac.be/owncloud/index.php/s/lHkMNCgOu66MtEs

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Monday, June 22, 2015

MaCBetH : Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation - implementation -

I like it, from the paper:

In particular, we shall analyze the algorithm using tools from spin glass theory in statistical mechanics, and show that there exists a phase transition between a phase where it is able to detect the rank, and a phase where it is unable to do so.

Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation by Alaa Saade, Florent Krzakala, Lenka Zdeborová

The completion of low rank matrices from few entries is a task with many practical applications. We consider here two aspects of this problem: detectability, i.e. the ability to estimate the rank $r$ reliably from the fewest possible random entries, and performance in achieving small reconstruction error. We propose a spectral algorithm for these two tasks called MaCBetH (for Matrix Completion with the Bethe Hessian). The rank is estimated as the number of negative eigenvalues of the Bethe Hessian matrix, and the corresponding eigenvectors are used as initial condition for the minimization of the discrepancy between the estimated matrix and the revealed entries. We analyze the performance in a random matrix setting using results from the statistical mechanics of the Hopfield neural network, and show in particular that MaCBetH efficiently detects the rank $r$ of a large $n\times m$ matrix from $C(r)r\sqrt{nm}$ entries, where $C(r)$ is a constant close to $1$. We also evaluate the corresponding root-mean-square error empirically and show that MaCBetH compares favorably to other existing approaches.

Implementations are are available in matlab http://github.com/alaa-saade/macbeth_matlab and in Julia http://github.com/alaa-saade/macbeth_julia. They are also on Alaa's webpage http://alaasaade.wordpress.com/ and on the SPHINX group software page: http://www.lps.ens.fr/~krzakala/WASP.html

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Sunday, June 21, 2015

Sunday Morning Insight: All hard problems are just slow feedback loops in disguise.

Today's insights are made up of different observations from the past few weeks:

Company X
Third Generation Genomics and a Kaggle ?
IoT and Machine Learning
Planet Machine Learning in Paris
Of Electric Dreams

Company X

With some friends, we think there is room for some disruptive hardware in Machine Learning and other areas. More on that later but don't be surprised if I get to talk more and more about it over the next few months.

Third Generation Genomics and a Kaggle ?

Earlier this month, as I mentioned in Compressive phase transition, a million genomes and 3rd generation sequencing, I attended the Bioinformatics for Third Generation Sequencing workshop organized by Hélène Touzet and colleagues. Some of the photos taken during the workshop can be found on my twitter feed under the #b3gs hashtag.

One of the most interesting part of the current Oxford Nanopore sequencer technology is that it is very much portable thereby changing some of the dynamics behind identifying strains in outbreaks. Nick Loman provided a few examples of that. One of the most awesome project by Nick and his team was that of the current Oxford Nanopore sequencers used in West Africa to map Ebola. At http://ebola.nextflu.org/ you get to see in near real time, which strains of Ebola travel where and how some of the strains mutate. It was absolutely fascinating. I wonder how much it will take people to use additonal side information such as weather or other IoT data to gain deeper understanding of the evolution and mutation of an outbreak. A similar map exist for the MERS-CoV outbreak . For more information go read today's blog entry by Lauren on Sequencing Ebola in the field: a tale of nanopores, mosquitos and whatsapp

http://ebola.nextflu.org/

Nick provided other fascinating examples such as strain identification in a hospital setting: the idea here is to figure out if you are figthing the same strain or a different strain coming from other outside sources (hospital remain open during outbreaks). The overall identification process seemed to take about 92 minutes. I wonder how much time before this sampling chain implemented by Nick and his team become a requirement as opposed to a "nice to have" technology.

Going deeper in the sequencing realm, Clive Brown of Oxford Nanopore was given a full hour talk. Nanopore technology allows for long reads and hence reduces the complexity of DNA alignment thereby enabling genome sequencing with midly good greedy solvers. Some bioinformatics people are skeptical about the technology as the readings of the base pairs does not seem to go to the precision required for medical diagnosis. The current accuracy of the system hovers in the 90% range when most medical applications require something like 99.9999%.

If you've been reading this blog long enough, you know that DNA goes through the pores at different speeds. In fact, the reason nanopore technology has taken so much time to flourish has been because of a sampling issue. In the crudest form, the DNA piece goes through nanopores at a speed of 1 million base pair per second. This is too fast for the electronics used to sample the DNA, so they use different chemical engines (they call them "chemistry") to slow down the travel of the DNA strand through the pores. With these chemistries, 1 million base pair per second slows down to an electronically manageable 500 to 1000 base pair per second. In fact, a 1000 base pairs per second is the fastest speed that Oxford Nanopore is currently bringing to the market.

At that point, one simply wonders if there is a way to sample at the initial speed of 1 000 000 base pairs per second using a compressive sensing approach such as A2I or the Xampling approach . Why would such sampling at tis rate be possible ? for one, the signal is constrained not by sparsity but it is made up of four DC subsignals (there are roughly 4 voltages levels characteristic of 4 base pairs), hence a few approaches would do well on a signal made up of a restrained (and quite literal) alphabet. The estimation process I just mentioned is called "base calling", i.e. the electrical signal from the pore is used to estimate which four letters A, G,T or C went through it. If the base caller and the electronics were to go a thousand faster than the faster product out there, Nick told me that quite a few applications could arise (it currently takes about 4 days to sequence the human genome with this technology, what if it were to take 6 minutes instead ? it would really become commodity sequencing.)

Further down the pipeline, there is the alignement process, whereby one uses each sets of identified series of base pairs to form a larger sequence. What differentiates third generation sequencing from the first two generations is the long length of base pairs found. It makes the problem of aligning base pairs not NP-hard any more. In order to enable that process, the technology oversamples the genome (they call this coverage: a 30X coverage means that the DNA has on average been covered 30 times). In short, we have grouped readings, oversampling of said readings and a 4-letter alphabet: all these elements point to compressive sensing and related matrix factorization algorithms for resolution. At the very least one wonders if current greedy algorithms are far from the phase transition of the problem or if current base callers can provide additional information.

An action item I took out of this meeting was to explore the possibilities of exposing some of these data to a Kaggle-like contest whether at the base caller level or at the alignement level (to be defined).

Other presentations at the meetings mostly tried to refine the quality of the 3rd generation sequencing using 1st and 2nd generation high accuracy sequencing methods as side information. The last talk by Vincent Croquette introduced us to a different kind of hardware that used holographic measurements and some mechanical means to figure out DNA sequencing information.

These sequencers can be seen as IoT sensors which brings us to the next meeting.

IoT and Machine Learning

Last week, I went to the Samsung VIP meeting that focused in Samsung's investment in France on IoT and in a "Strategy and Innovation center" here in Paris. I briefly spoke with Young Sohn and Luc Julia about their on-going Machine Learning efforts.

My view: The IoT business is tough because, in part, we have not had much large datasets in that area. It is also a treacherous business, if your sensor is in direct competition with data that can be had with a camera, it will eventually lose out to imaging CMOS. Furthermore and as I mentioned in a previous Paris IoT meetup, the "making sense" of data from these streams can only be made after accumulating enough of it.

Genome sequencing for instance can become a science only when you hit a high sampling bounds. The current stage of IoT enterprises is to build the data gathering infrastructure as we speak: We can only expect the insights to build up over some time. With that in mind, techniques such as compressive sensing are key to IoT development because most of the data in that field cannot leverage the compression technology used for images and videos, it is a blessing because:

randomized algorithms are a cheap way of getting compression

random projections keep some information in a way that overoptimized compression technology like jpeg cannot.

Planet Machine Learning in Paris

Last but not least, we had our last regular Machine Learning meetup of the season ( videos and slides are at: Paris Machine Learning Meetup #10 Season 2 Finale: "And so it begins": Deep Learning, Recovering Robots, Vowpal and Hadoop, Predicsis, Matlab, Bayesian test, Experiments on #ComputationalComedy & A.I.)

We also have two upcoming "Hors Séries" coming up as planet ML is coming to France in the next few weeks with the advent of COLT in Paris and ICML in Lille. These meetups will be

If you think you'll be in Paris and want to talk Machine Learning to an adoring crowd, send me an email and will see if we can set up some other "Hors Séries" within the next few weeks. We can have access to rooms for presentations within days and have more than 2300 members, a subset of which will like your presentation. We can also simply meet for a coffee !

Of Electric Dreams

Last but not least, this past week, the interwebs went crazy about certain kind of images generated by deep neural networks. Aside from the obvious trippy images, one wonders how long it will take before we have a subfield of machine learning investigating specific types of psychiatric diseases from the DSM-V list and images generated by current Machine Learning techniques. In my view, some recommender systems also have behaviors that fit diseases listed in the DSM-V: simple re-targeting being one.

Coming back to images, this interest in producing images that are close to our experience or to artistic endeavors is not new and did not start with deep neural networks. Back in 2007, we noticed that regularization had the possibility of generating artistic images ( Can I tell you my secret now ?....I see dead reconstructions). More recently, reconstructing images from seemingly deteriorated information has produced similarly interesting patterns (Do Androids Recall Dreams of Electric Sheeps ?, From Bits to Images: Inversion of Local Binary Descriptors - implementation -)

For instance, back in season 1 of the Paris Machine Learning Meetup #3 ("We already have cats under control") Patrick Perez did a presentation From image to descriptors and back again and mentioned a series of photos reconstructed from descriptors on Herve Jegou's page. (Reconstruction of images from their local descriptors )

The title of this blog entry comes directly from Abe Stanway .

Credit photo: Terry Virts, NASA

Join the CompressiveSensing subreddit or the Google+ Community and post there !

Nuit Blanche Referenced in the Dead Tree World!

The Big Picture in Compressive Sensing was mentioned in an article of La Recherche, the french speaking equivalent/competitor to Science. October 2010 issue, page 20-21.
Wired Magazine had a piece on Compressed Sensing featuring links to this blog and the Big Picture. (March 1, 2010)
Emmanuel Candes and Terry Tao wrote about Nuit Blanche in the Dec. '08 issue of the IEEE Information Theory Society Newsletter
Xiaochuan Pan, Emil Sidky and Michael Vannier wrote about Nuit Blanche in Why do commercial CT scanners still employ traditional, filtered back-projection for image reconstruction?.
Check also the acknowledgments in this Ghost Imaging paper and this one.

Nuit Blanche

Page Views on Nuit Blanche since July 2010

Tuesday, June 30, 2015

Vowpal Wabbit: A Machine Learning System ( Paris Machine Learning Meetup "Hors Série" #4 Season 2 )

Monday, June 29, 2015

Slides and Implementation: Random forests with random projections of the output space for high dimensional multi-label classification

Thesis: Learning in High Dimensions with Projected Linear Discriminants by Robert Durrant

Sunday, June 28, 2015

Slides: Learning with Random Projections, Random Projections for Machine Learning and Data Mining: Theory & Applications

CfP: The 3rd International Workshop on High Dimensional Data Mining (HDM’15)

Friday, June 26, 2015

Random Projections as Regularizers, Compressive Linear Least Squares Regression and more

The Unreasonable Effectiveness of Random Projections in Computer Science

Towards Large Scale Continuous EDA: A Random Matrix Theory Perspective - implementation -

Thursday, June 25, 2015

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

The Small Victories

Physics-driven inverse problems made tractable with cosparse regularization

Wednesday, June 24, 2015

Raking the Cocktail Party - implementation -

Thesis: Listening to Distances and Hearing Shapes: Inverse Problems in Room Acoustics and Beyond

Tuesday, June 23, 2015

Simultaneous Orthogonal Matching Pursuit With Noise Stabilization: Theoretical Analysis - implementation -

Monday, June 22, 2015

MaCBetH : Matrix Completion from Fewer Entries: Spectral Detectability and Rank Estimation - implementation -

Sunday, June 21, 2015

Sunday Morning Insight: All hard problems are just slow feedback loops in disguise.

Printfriendly