Monday, September 22, 2014

Book / Proceedings / Lecture Notes: Statistical physics, Optimization, Inference and Message-Passing algorithms



Last year, there was an "Ecole de Physique" in Cargese on Statistical physics, Optimization, Inference and Message-Passing algorithms. Florent Krzakala, one of the organizers, mentioned earlier today that they had released the lecture notes of the talks. They are all on this page

Autumn School, September 30 -- October 11, 2013


Proceeding of all lectures


Called "Statistical Physics, Optimization, Inference, and Message-Passing Algorithms" and edited by F. Krzakala, F. Ricci-Tersenghi, L. Zdeborova, R. Zecchina, E. W. Tramel, L. F. Cugliandolo. A book containing the proceeding of all lectures is in preparation at Oxford University Press. A preliminary version of some chapters can be found online on Arxiv:


 
 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Sunday, September 21, 2014

Unfiltered Access



It happened last season. When Franck and I decided to program Season 1 of the Paris Machine Learning meetup, we initially thought the whole thing would run out pretty quickly. Indeed, our interest was to get people to talk about algorithms as opposed to specific web/langage technologies as is usually the case in technical meetups these days.

Over the course of the Parisian Winter, we both realized that people who came to the meetups were not scared by difficult subjects. In fact, the feedback suggested attendees were sick of dumbed down stories on important topics. While the presentations were probably less heavy than usual academic talks, quite a few academics even told us they liked the format. Deep down, it really was a moment of nostalgia. Those instances reminded people of how great they were when they were learning in their earlier years. They remembered how exceptional those moments were, they craved for these unique times. The meetups brought back that message: You are not stupid, you are decrypting the world and you are not alone.

Over the Parisian Spring, we realized something important. Some of the best people in the world could not come to Paris. Not that they did not want to, but, much like long distance runners, most have reached the Zone and they don't want to deviate from it for some limited exposure. We realized that there had to be a better way, so we started with Skype or hangouts to get them to speak to our audience for 10 to 15 minutes remotely.

Another realization crystalized at meetup #10. I had seen a video of Brenda McCowan who was talking about what her group was doing in Dolphin communication - a faraway concern to people who's livelihood is to predict the next click. Putting this in context of our meetup's theme, that work is hard Unsupervised Learning, it's the stuff of exploration, think Champollion but with animals and no Rosetta Stone: we had to get her on the meetup schedule. Because Brenda is very busy and because this was probably an unexpected invitation from some crazy French guys, she kindly accepted a five minutes Q&A from our crowd on Skype after we had locally played the video that had been recorded earlier by the good folks at NIMBioS. The untold promise was that it would be one of the least impactful event in her professional life: answer technical questions for five minutes from folks 8,000 miles away who had just seen her earlier academic pitch. In the end, the exchange lasted longer.

I am sure I am not the only person in the audience who realized this but for the first time in our lives, when we were asking questions to Dr. Brenda McCowan, we were becoming actors of a really technical National Geographic feature film... "sans" the post-production , "sans" the dumbing down of a traditional filtered Q&A, "sans" the delicate narrative of a show. We, the meetup audience, whose inner 9-year old kids had given up on our own dreams of talking to Flipper when we decided to be serious and eventually have jobs, realized we were having a technically insightful conversation with the one of the few people on Earth who had the interest, knowledge and means of potentially understanding Dolphin communication. For a moment, more than a hundred "wiser" 9-year old kids were back in business.


Props to NIMBioS, Brenda McCowan, meetup.com , Skype, TheFamilyHopWork, DojoCrea, for making this unique moment happen.

In light of the success of the Paris Machine Learning Applications Group, I just started two new meetups in Paris: 
Credit Photo: Hans Hillewaert - The Rosetta Stone in the British Museum.
 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, September 19, 2014

The Randomized Causation Coefficient - implementation -




We are interested in learning causal relationships between pairs of random variables, purely from observational data. To effectively address this task, the state-of-the-art relies on strong assumptions regarding the mechanisms mapping causes to effects, such as invertibility or the existence of additive noise, which only hold in limited situations. On the contrary, this short paper proposes to learn how to perform causal inference directly from data, and without the need of feature engineering. In particular, we pose causality as a kernel mean embedding classification problem, where inputs are samples from arbitrary probability distributions on pairs of random variables, and labels are types of causal relationships. We validate the performance of our method on synthetic and real-world data against the state-of-the-art. Moreover, we submitted our algorithm to the ChaLearn's "Fast Causation Coefficient Challenge" competition, with which we won the fastest code prize and ranked third in the overall leaderboard.


The code is on David Lopez-Paz's page.
 
Also relevant: The Randomized Dependence Coefficient by David Lopez-Paz, Philipp Hennig and Bernhard Schölkopf. Attendant code is also on David Lopez-Paz's page.  

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Randomized Kernels, Randomized Solvers, Random features and more

Here are a slew of very interesting papers connecting to the earlier Hardware Based Stochastic Gradient Descent and much earlier random features:

High-performance Kernel Machines with Implicit Distributed Optimization and Randomization by Vikas Sindhwani, Haim Avron
Complex machine learning tasks arising in several domains increasingly require "big models" to be trained on "big data". Such models tend to grow with the complexity and size of the training data, and do not make strong parametric assumptions upfront on the nature of the underlying statistical dependencies. Kernel methods constitute a very popular, versatile and principled statistical methodology for solving a wide range of non-parametric modelling problems. However, their storage requirements and high computational complexity poses a significant barrier to their widespread adoption in big data applications. We propose an algorithmic framework for massive-scale training of kernel-based machine learning models. Our framework combines two key technical ingredients: (i) distributed general purpose convex optimization for a class of problems involving very large but implicit datasets, and (ii) the use of randomization to significantly accelerate the training process as well as prediction speed for kernel-based models. Our approach is based on a block-splitting variant of the Alternating Directions Method of Multipliers (ADMM) which is carefully reconfigured to handle very large random feature matrices only implicitly, while exploiting hybrid parallelism in compute environments composed of loosely or tightly coupled clusters of multicore machines. Our implementation supports a variety of machine learning tasks by enabling several loss functions, regularization schemes, kernels, and layers of randomized approximations for both dense and sparse datasets, in a highly extensible framework. We study the scalability of our framework on both commodity clusters as well as on BlueGene/Q, and provide a comparison against existing sequential and parallel libraries for such problems.

Scalable Kernel Methods via Doubly Stochastic Gradients by Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina Balcan, Le Song
The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems. Or have we simply not tried hard enough for kernel methods? Here we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Our approach relies on the fact that many kernel methods can be expressed as convex optimization problems, and we solve the problems by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random functions associated with the kernel, and then descending using this noisy functional gradient. We show that a function produced by this procedure after t iterations converges to the optimal function in the reproducing kernel Hilbert space in rate O(1/t), and achieves a generalization performance of O(1/t√). This doubly stochasticity also allows us to avoid keeping the support vectors and to implement the algorithm in a small memory footprint, which is linear in number of iterations and independent of data dimension. Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show that our method can achieve competitive performance to neural nets in datasets such as 8 million handwritten digits from MNIST, 2.3 million energy materials from MolecularSpace, and 1 million photos from ImageNet.

Large-scale randomized-coordinate descent methods with non-separable linear constraints by Sashank Reddi, Ahmed Hefny, Carlton Downey, Avinava Dubey, Suvrit Sra
We develop randomized (block) coordinate descent (CD) methods for linearly constrained convex optimization. Unlike most CD methods, we do not assume the constraints to be separable, but let them be coupled linearly. To our knowledge, ours is the first CD method that allows linear coupling constraints, without making the global iteration complexity have an exponential dependence on the number of constraints. We present algorithms and analysis for four key problem scenarios: (i) smooth; (ii) smooth + nonsmooth separable; (iii) asynchronous distributed; and (iv) stochastic. We illustrate empirical behavior of our algorithms by simulation experiments.

Compact Random Feature Maps by Raffay Hamid, Ying Xiao, Alex Gittens, Dennis DeCoste
Kernel approximation using randomized feature maps has recently gained a lot of interest. In this work, we identify that previous approaches for polynomial kernel approximation create maps that are rank deficient, and therefore do not utilize the capacity of the projected feature space effectively. To address this challenge, we propose compact random feature maps (CRAFTMaps) to approximate polynomial kernels more concisely and accurately. We prove the error bounds of CRAFTMaps demonstrating their superior kernel reconstruction performance compared to the previous approximation schemes. We show how structured random matrices can be used to efficiently generate CRAFTMaps, and present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class classifiers. We present experiments on multiple standard data-sets with performance competitive with state-of-the-art results.

Random Laplace Feature Maps for Semigroup Kernels on Histograms Jiyan Yang, Vikas Sindhwani, Quanfu Fan, Haim Avron

With the goal of accelerating the training and test ing complexity of nonlinear kernel methods, several recent papers have proposed explicit embeddings of the input data into low-dimensional feature spaces, where fast linear methods can instead be used to generate approximate solutions. Analogous to random Fourier feature maps to approximate shift-invariant kernels, such as the Gaussian kernel, on Rd, we develop a new randomized technique called random Laplace features, to approximate a family of kernel functions adapted to the semigroup structure of Rd+. This is the natural algebraic structure on the set of histograms and other non-negative data representations. We provide theoretical results on the uniform convergence of random Laplace features. Empirical analyses on image classification and surveillance event detection tasks demonstrate the attractiveness of using random Laplace features relative to several other feature maps proposed in the literature.


Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels by Jiyan Yang, Vikas Sindhwani, Haim Avron, Michael Mahoney

We consider the problem of improving the efficiency of randomized Fourier feature maps to accelerate training and testing speed of kernel methods on large datasets. These approximate feature maps arise as Monte Carlo approximations to integral representations of shift-invariant kernel functions (e.g., Gaussian kernel). In this paper, we propose to use Quasi-Monte Carlo (QMC) approximations instead where the relevant integrands are evaluated on a low-discrepancy sequence of points as opposed to random point sets as in the Monte Carlo approach. We derive a new discrepancy measure called box discrepancy based on theoretical characterizations of the integration error with respect to a given sequence. We then propose to learn QMC sequences adapted to our setting based on explicit box discrepancy minimization. Our theoretical analyses are complemented with empirical results that demonstrate the effectiveness of classical and adaptive QMC techniques for this problem.

From Kernel Machines to Ensemble Learning by Chunhua Shen, Fayao Liu

Ensemble methods such as boosting combine multiple learners to obtain better prediction than could be obtained from any individual learner. Here we propose a principled framework for directly constructing ensemble learning methods from kernel methods. Unlike previous studies showing the equivalence between boosting and support vector machines (SVMs), which needs a translation procedure, we show that it is possible to design boosting-like procedure to solve the SVM optimization problems. In other words, it is possible to design ensemble methods directly from SVM without any middle procedure. This finding not only enables us to design new ensemble learning methods directly from kernel methods, but also makes it possible to take advantage of those highly-optimized fast linear SVM solvers for ensemble learning. We exemplify this framework for designing binary ensemble learning as well as a new multi-class ensemble learning methods.  Experimental results demonstrate the flexibility and usefulness of the proposed framework. 
Credits: ESA/Rosetta/MPS for OSIRIS Team MPS/UPD/LAM/IAA/SSO/INTA/UPM/DASP/IDA
Date: 15 September 2014
Satellite: Rosetta
Depicts: Philae's primary landing site

Wednesday, September 17, 2014

Ce Soir: Paris Machine Learning Meetup #1, Season 2, A New Beginning (Snips, Nomo, Clustaar and more)





Once again with Franck, we have decided to do a second season of the pretty successul season 1 of the Paris Machine Learning Meetup. We have about 1200 members and growing. The twitter hashtag for tonight's meetup is #MLParis.
The YouTube page is here, while the Google Hangout Event page is here.

Resources for the Paris Machine Learning group other than ones on Meetup.com include the LinkedIn group, Google+ group and the Archives of the meetup.
A big thank you to DojoEvents for hosting us.

Here is the tentative list of talks and abstracts for tonight's meetup:

+ Franck Bardol, Igor Carron, Introduction, What's New....

Lightning talk (5 minutes)

+ Jean-Baptiste Tien, Criteo, Update on the Kaggle Criteo contest ( remote lightning talk from Palo Alto)

Full Talks (15-20 minutes)

+ Maël Primet, Snips,

Machine Learning for Context-Awareness

Abstract: Your mobile phone is becoming your trusted companion & entry point for most of your activities: meeting with friends, finding a coffee shop, buying clothes, photographing your memories, or writing your thoughts. In order serve you best, your devices will now have to learn your every preferences, make sense of your context wherever you are, and predict with accuracy what you are going to do next and with who. At Snips, we use machine-learning & beautiful design to complete our mission: turning your mobile devices into a brain, in your pocket! 

+ Cédric Coussinet:  

Langage Nomo
 Le projet nomoSeed propulse le langage nomo pour concevoir des systèmes complexes temps-réel avec des capacités d'apprentissage incrémental. Le langage nomo permet de définir une base de règles toujours prêtes à réagir à tout événement et pouvant s’adapter et créer des règles en fonction des interactions constantes avec l'environnement. http://nomoseed.org

+ Philippe Duhamel & Nicolas Chollet (www.clustaar.com)

Extract Consumer Insight from Seach Engine Queries
Abstract:  150 billion searches are made in Google each month on a global basis. As an increasing number of cases indicates, analyzing the terms people search for in Google offers a new understanding of how people and consumers behave and express their needs and concerns. In this presentation, we show how algorithm-based and semantical search term analysis can bring a powerful and relevant market insights to brands and organizations. 
 
 
 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Multiresolution Matrix Factorization

You probably recall the idea expressed in Sunday Morning Insight on Matrix Factorizations and the Grammar of Life, where one would iterate on several matrix factorization in order to provide a convinving decomposition of a data matrix. The idea is similar to diffusion wavelets and more recent  signal processing on graphs. Here is a new paper on the matter:  




Multiresolution Matrix Factorization by Risi Kondor, Nedelina Teneva and Vikas Garg

Abstract: The types of large matrices that appear in modern Machine Learning problems often have complex hierarchical structures that go beyond what can be found by traditional linear algebra tools, such as eigendecompositions. Inspired by ideas from multiresolution analysis, this paper introduces a new notion of matrix factorization that can capture structure in matrices at multiple different scales. The resulting Multiresolution Matrix Factorizations (MMFs) not only provide a wavelet basis for sparse approximation, but can also be used for matrix compression (similar to Nystrom approximations) and as a prior for matrix completion.


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, September 16, 2014

Hardware Based Stochastic Gradient Descent


 

Starting at 52 minutes and 10 seconds of this video, Ali Rahimi explains how one can use a noisy evaluation of the gradient in a gradient descent algorithm in order to minimize a convex function (most relaxations featured in current Advanced Matrix Factorization techniques and some instances of Compressive Sensing fall in that category). The interesting part of this is that the noise comes from the errors made by the, now stochastic, Floating Point Unit.  From the paper below:

 Unlike the traditional setting for stochastic gradient descent, where stochasticity arises because the gradient direction is computed from a random subset of a dataset, here the processor itself is the source of stochasticity. We call our approach application robustification.

There have been several attempts at correcting process variation induced errors by identifying and masking these errors at the circuit and architecture level. These approaches take up valuable die area and power on the chip. As an alternative, we explore the feasibility of an approach that allows these errors to occur freely, and handle them in software, at the algorithmic level. In this paper, we present a general approach to converting applications into an error tolerant form by recasting these applications as numerical optimization problems, which can then be solved reliably via stochastic optimization. We evaluate the potential robustness and energy benefits of the proposed approach using an FPGA-based framework that emulates timing errors in the floating point unit (FPU) of a Leon3 processor. We show that stochastic versions of applications have the potential to produce good quality outputs in the face of timing errors under certain assumptions. We also show that good quality results are possible for both intrinsically robust algorithms as well as fragile applications under these assumptions.

 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, September 15, 2014

Accelerating Random Forests in Scikit-Learn


Accelerating Random Forests in Scikit-Learn by Gilles Louppe



 Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

CSjob: Associate professor in signal processing with special emphasis on compressive sensing, Aalborg, Denmark


Found on the interweb:

Associate professor in signal processing with special emphasis on compressive sensing (42102)

At the Faculty of Engineering and Science, Department of Electronic Systems, Signal and Informations Processing section (SIP), a position as associate professor in signal processing with special emphasis on compressive sensing is open for appointment from 1 October 2014 or soon hereafter. The position is for a period of four years. The Department of Electronic Systems is one of the largest departments at Aalborg University with a total of more than 300 employees. The department is internationally recognized in particular for its contributions within Information and Communication Technology (ICT). The research and teaching of the Department of Electronic Systems focus on electronic engineering and the activity areas are organized in five sections (http://www. es.aau.dk/research/sections/) The department also hosts some larger research centers (http://www. es.aau.dk/research/centres/) (with great international scope). The department focuses on maintaining a close interplay with the university’s surroundings - locally, nationally and internationally – as well as producing unique basic research and educating talented and creative engineers. The department collaborates with leading ICT researchers all over the world.   

Job description

4 years position as associate professor in signal processing with special emphasis on compressive sensing

For a project supported by the Danish Research Councils and Aalborg University we need an associate professor for compressive sensing applied to Atomic Force Microscopy. Significant teaching (up to around 50% average over the four years) must be expected. This can be at bachelor, master and PhD level. Documented pedagogical skills at university level is mandatory.

Competences:
  - Must have master and PhD degrees in signal processing or similar.
  - Experience in developing high quality computational software for open source projects    (also willing to make software publicly available as demanded by the research project).
  - Must be able to document research experience in compressive sensing – including non-ideal effects.
  - Experience in Python programming for computational science.
  - Documented pedagogical competences at university level (preferably in mathematical modeling, Python programming, compressive sensing, scientific computing and related areas).
  - Experience in developing new courses – preferably both at bachelor, master and PhD level.
  - Knowledge in Atomic Force Microscopy (AFM) is an advantage.

You may obtain further professional information from Professor, Dr.Techn. Torben Larsen, phone: +45 20206856 or e-mail: tl@es.aau.dk.

Qualification requirements:

The level of qualification for Associate Professors shall correspond to the level, which can be achieved on the basis of the appointment as Assistant Professor, but may be achievable in other ways. The appointment presupposes that the applicant can demonstrate original scientific production at an international level as well as documented teaching qualifications.

Appointment to the position requires that both research and teaching qualifications are at the requested level. The two qualifications will be given equal and principal priority in the overall assessment.

The application must contain the following:
• A motivated text wherein the reasons for applying, qualifications in relation to the position, and intentions and visions for the position are stated.
• A current curriculum vitae.
• Copies of relevant diplomas (Master of Science and PhD).
• Scientific qualifications. A complete list of publications must be attached with an indication of the works the applicant wishes to be considered. You may attach up to 10 publications.
• Teaching qualifications described in the teaching portfolio.  If this is not enclosed the applicant must include an explanation for its absence.
• Dissemination qualifications, including participation on committees or boards, participation in organisations and the like.
• Additional qualifications in relation to the position.
• References/recommendations.
• Personal data.
The applications are only to be submitted online by using the "Apply online" button below.

An assessment committee will assess all candidates.
For further information concerning the application procedure please contact Mads Brask Andersen by mail man@adm.aau.dk or phone (+45) 9940 9680.

Information regarding guidelines, ministerial circular in force, teaching portfolio and procedures can be seen here.

Agreement

Employment is in accordance with the Ministerial Order on the Appointment of Academic Staff at Universities (the Appointment Order) and the Ministry of Finance’s current Job Structure for Academic Staff at Universities. Employment and salary are in accordance with the collective agreement for state-employed academics or the collective agreement for academics under the Danish Society of Engineers’ (IDA) and the Danish Association of Chartered Surveyors’ (DDL) negotiation areas.   

Vacancy number

42102

Deadline

15/09/2014
 
 
Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.