Sunday, August 31, 2014

Nuit Blanche in Review ( August 2014 )

Since the last Nuit Blanche in Review (July 2014), Rosetta arrived at 67P but many other awesome things happened:

We featured an instance of Data Driven or Zero Knowledge Sensor Designa Depth Camera for Close-Range Human Capture and Interaction

This panel video entitled Is Deep Learning the Final Frontier and the End of Signal Processing ? produced a strong reaction from one of the leaders in Deep Learning. His response: Yoshua Bengio's view on Deep Learning.  I added a few items in Parallel Paths for Deep Learning and Signal Processing ?Yoshua Bengio's tutorial slides at KDD that also took place this week are at: Scaling up Deep Learning. Other KDD slides and more can be found here.

In a different direction, we had several genomics items of interest and how compressive sensing and advanced matrix factorization could play a role there:

We also featured quite a few Sunday Morning Insights: 

and had at least two entries on reproducible research
We had several very interesting preprint related entries: 

The blogs
Videos and slides:
CfP


  • Copyright ESA/Rosetta/NAVCAM
  • Title Comet on 23 August 2014 - NavCam
  • Released 27/08/2014 10:00 am

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, August 30, 2014

Saturday Morning Video: Life at the Speed of Light - Craig Venter ( and some remarks )

We mentioned Craig Venter before ( How Can Compressive Sensing and Advanced Matrix Factorizations enable Synthetic Biology ? ), he is a very inspiring speaker. This one is no exception. I believe there are a lot many issues that need to be addressed in this exploration and I have decided to create the Paris BioSciences Meetup group as a result. We'll about the scientific and technical aspect of some of the issues mentioned in this video and more, come join us if you are in the Paris area. Without further ado:



A fascinating aspect of their work is how Craig's team devise a way to figure out which gene is good for life and which one it is not. Mostly by knocking them one by one and finding out which ones allow the cell to survive. This looks like a combinatorial approach. Maybe a group testing approach ( connected to comprressive sensing) might help. At about 38 minutes, he also makes the case that much of the genome is oversampled so that a specific capacity can be replaced by a different set of genes. 

I note that his team uses a mix of previous and the new PacBio long read technology. 

It was in the video we mentioned back in 2012 that sometimes a 1 base pair means the difference between life and death of the cell. There may be redundancy at the gene level but a one base pair difference between life and death point to other issues. 

From an engineering standpoint, I note that the optics of the Illumina had to be reframed after the road trip (at 58 minutes).


Relevant blog entries:
  1. Videos and Slides: Next-Generation Sequencing Technologies - Elaine Mardis (2014)
  2. Improving Pacific Biosciences' Single Molecule Real Time Sequencing Technology through Advanced Matrix Factorization ?
  3. DNA Sequencing, Information Theory, Advanced Matrix Factorization and all that...
  4. It's quite simply, the stuff of Life...


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, August 29, 2014

Slides: Deep Learning - Russ Salakhutdinov, KDD2014, Accelerating Random Forests on Scikit-Learn - Gilles Louppe



After the slides of Yoshua Bengio ( Scaling up Deep Learning - Yoshua Bengio and attendant Yoshua Bengio's view on Deep Learning) at KDD, here are the slides of Russ Salakhutdinov
as a side note let us mention that Deep Boltzmann Machines seem to share some similarities with Approximate Message Passing algorithms ( as mentioned earlier). 


All the other tutorial slides at KDD2014 can be found here.


Unrelated, here are the slides of Gilles Louppe on Accelerating Random Forests on Scikit-Learn



Universal Streaming


Given a stream of data, a typical approach in streaming algorithms is to design a sophisticated algorithm with small memory that computes a specific statistic over the streaming data. Usually, if one wants to compute a different statistic after the stream is gone, it is impossible. But what if we want to compute a different statistic after the fact? In this paper, we consider the following fascinating possibility: can we collect some small amount of specific data during the stream that is "universal," i.e., where we do not know anything about the statistics we will want to later compute, other than the guarantee that had we known the statistic ahead of time, it would have been possible to do so with small memory? In other words, is it possible to collect some data in small space during the stream, such that any other statistic that can be computed with comparable space can be computed after the fact? This is indeed what we introduce (and show) in this paper with matching upper and lower bounds: we show that it is possible to collect universal statistics of polylogarithmic size, and prove that these universal statistics allow us after the fact to compute all other statistics that are computable with similar amounts of memory. We show that this is indeed possible, both for the standard unbounded streaming model and the sliding window streaming model.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, August 28, 2014

Compressive Sampling of Polynomial Chaos Expansions: Convergence Analysis and Sampling Strategies

The sensing matrices in Uncertainty Quantification are very structured and so it is sometimes difficult to use solvers relying on gaussian measurement matrices and one wonders if solvers different from L1 could be contemplated in light of the recent SwAMP thing. Anyway, today's paper investigate in a deep fashion this subject. Let us note the use of phase transition as a comparison tool, great !



Sampling orthogonal polynomial bases via Monte Carlo is of interest for uncertainty quantification of models with high-dimensional random inputs, using Polynomial Chaos (PC) expansions. It is known that bounding a probabilistic parameter, referred to as {\it coherence}, yields a bound on the number of samples necessary to identify coefficients in a sparse PC expansion via solution to an ℓ1-minimization problem. Utilizing asymptotic results for orthogonal polynomials, we bound the coherence parameter for polynomials of Hermite and Legendre type under the respective natural sampling distribution. In both polynomial bases we identify an importance sampling distribution which yields a bound with weaker dependence on the order of the approximation. For more general orthonormal bases, we propose the {\it coherence-optimal} sampling: a Markov Chain Monte Carlo sampling, which directly uses the basis functions under consideration to achieve a statistical optimality among all sampling schemes with identical support. We demonstrate these different sampling strategies numerically in both high-order and high-dimensional, manufactured PC expansions. In addition, the quality of each sampling method is compared in the identification of solutions to two differential equations, one with a high-dimensional random input and the other with a high-order PC expansion. In both cases the coherence-optimal sampling scheme leads to similar or considerably improved accuracy.
Relevant previous entry:



Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Wednesday, August 27, 2014

Parallel Paths for Deep Learning and Signal Processing ?

This is a follow-up to this thread.

Deep learning in Machine Learning stands for neural networks with many layers for the purpose of producing features that can eventually be used for classification. In image/signal processing, the generic idea is to acquire and decompose a signal along a family of (generally) known signals (bases in known/identified spaces). Simple machine learning techniques can then rely on the elements of that decomposition (the features in ML) to achieve the last step of a classification task. Is there a convergence between the two approaches ?

As Stephane Mallat [1] pointed out in this panel, an FFT/IFFT is already deep, in fact a process that is decomposed through several matrix factorizations is also deep as it requires several iterations of different factorizations ( see Sparse Matrix Factorization: Simple rules for growing neural nets and Provable Bounds for Learning Some Deep Representations ) with each factored matrix representing a layer.

But if we explore recent developments in Compressive Sensing, we know that that most reconstruction solvers used to take a long time to converge and that any convergence was measured in hundreds if not thousands of iterations. If any iteration could be construed as a layer (as is the case in autoencoders) , a very deep network composed of a thousand layers would be clearly a non publishable offence.  Aside from the long "depth", some of these solvers rely on linear operations whereas current neural networks implicitely use nonlinear functionals.  

Recently, many things changed in Compressive Sensing with the appearance of Approximate Message Passing algorithms. They are theoretically sound and require only a few iterations (5 to 20) to obtain convergence. Each of these iterations can be expressed as a nonlinear functional of a matrix-vector multiply akin to each layer's computation in neural networks:




From: The SwAMP Thing! Sparse Estimation with the Swept Approximated Message-Passing Algorithm -implementation -

One could argue that the first layer in Compressive Sensing does not parallel that in Neural Networks. It turns out that a few people [6] are working on what is called  one bit sensing [5] which is very close in spirit to neural networks. 

In all, each of these approaches are building deep networks in their own way...using different vocabularies.





Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

#itwist14 and CfP: Biomedical and Astronomical Signal Processing (BASP) Frontiers 2015

The twitter handle you currently want to follow is that of iTWIST ( international Traveling Workshop on Interactions between Sparse models and Technology ) that is taking place right now, it is:


Also Yves Wiaux sent me the following:

Hi Igor
Could I ask you to post this conference announcement on Nuit Blanche?
Thanks a lot in advance
Cheers
Yves
Here is the announcement:

***
Re: Biomedical and Astronomical Signal Processing (BASP) Frontiers 2015 at baspfrontiers.org

Dear Colleagues

Abstract submission is now open until September 26 for deluxe posters contributions by young and senior researchers: http://www.baspfrontiers.org/submission.php?page=Author

See also
Committed invited participants: http://www.baspfrontiers.org/participants.php

Best contribution awards: http://www.baspfrontiers.org/award.php

Do not hesitate to disseminate this announcement.

On behalf of the chairs

Regards

Yves
***
___________________________________
Dr Yves Wiaux, Assoc. Prof., BASP Director
Institute of Sensors, Signals & Systems
School of Engineering & Physical Sciences
Heriot-Watt University, Edinburgh


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Yoshua Bengio's view on Deep Learning


Following this entryYoshua Bengio just wrote the following on his Google+ stream (I have a added a few links and reshaped the text for clearer reading):

There was an interesting debate on deep learning at Technion a couple of days ago: http://nuit-blanche.blogspot.fr/2014/08/is-deep-learning-final-frontier-and-end.html 
I wasn't there but I wished I had been to clarify a number of points. So I wrote this message to the members of the debate panel (sorry, it's pretty long):
-------- 
I just watched your recent debate on deep learning that took place at Technion. It was very interesting to hear you but made me wish very much that I had been there to inform the debate. So I would like to throw in a few answers to some of the questions and comments about some statements that were made. 
(1) Lack of theory and understanding of why deep learning works
I would be much more nuanced then what was said: there is already a lot of theory, but a lot more remains mysterious and needs to be worked out.  
1.1 Regarding depth, there are several theoretical results showing that deep circuits (including deep feedforward nets with rectifiers) can represent some functions exponentially (wrt depth) more efficiently than shallower ones (or ones of depth 2, which is the minimum to get universal approximation). The latest in this series (with references to previous ones) is this one: http://arxiv.org/abs/1402.1869 
1.2 Regarding the effect of learning a distributed representation, the same paper (following on many papers where I discuss this non-mathematically, including my 2009 book, and also following Montufar's similar result on RBMs,http://arxiv.org/pdf/1206.0387v4) shows that even a single layer distributed representation can represent an exponentially richer function (or distribution, for RBMs) than a "non-distributed" learner, e.g., kernel SVM, clustering, k-nearest-neighbors, decision trees, and other such non-parametric machine learning methods. This follows on previous negative results on such non-distributed local learners in my NIPS'2005 paper with Delalleau "The curse of highly variable functions for local kernel machines" and a related paper on the limitations of decision trees ("Decision trees do not generalize to new variations"). 
1.3 Insight behind the theory. The basic reason we get these potentially exponential gains is that we have compositionality of the parameters, i.e., the same parameters can be re-used in many contexts, so O(N) parameters can allow to distinguish O(2^N) regions in input space, whereas with nearest-neighbor-like things, you need O(N) parameters (i.e. O(N) examples) to characterize a function that can distinguish betwen O(N) regions.
Of course, this "gain" is only for some target functions, or more generally, we can think of it like a prior. If the prior is applicable to our target distribution, we can gain a lot. As usual in ML, there is no real free lunch. The good news is that this prior is very broad and seems to cover most of the things that humans learn about. What it basically says is that the data we observe are explained by a bunch of underlying factors, and that you can learn about each of these factors WITHOUT requiring to see all of the configurations of the other factors. This is how you are able to generalize to new configurations and get this exponential statistical gain. 
(2) Generative & unsupervised models. 
At the end there was a question about unsupervised and generative deep learning, and I would have really liked to be there to say that there is a lot of it, and in fact it is one of the main differences between the neural net research of the previous wave and the current wave, i.e., that we have made a lot of progress in designing unsupervised learning algorithms, including the generative type, and including deep ones. I am sure you have already heard about RBMs? and maybe about denoising auto-encoders? They are shallow but you can use them to pre-train deep generative models such as DBMs and DBNs (although these days we are moving in a territory where you don't need pre-training anymore). To see how powerful these are I invite you to look at some of the images generated in the papers of the last few years, e.g., http://www.icml-2011.org/papers/591_icmlpaper.pdf, or or pretty much all the papers from Ruslan Salakhutdinov or http://arxiv.org/pdf/1312.6114v10. Some that literature is reviewed in our 2013 TPAMI review paper
(3) How deep is deep enough?  
The above theoretical results say that the answer is data-dependent. For some tasks you may need a deeper (more non-linear, more complex) function. Also the actual "number" of layers is somewhat immaterial because it depends on the richness of the non-linearities that we put in each layer (i.e. you may simulate a depth-d net with a depth 2d depending on what operations each level is allowed to perform). However, with the usual suspects (neuron-like things and RBF-like things and gate-like things and sums and products), two levels give you universal approximation, so what we usually call "shallow" is depth 1 or 2. Depth 1 corresponds to linear systems, which are not even universal approximators, but are still very often used because they are so convenient. Regarding FFTs and such, they are indeed deep but that is not deep learning. Deep learning is when you learn multiple levels of representation, and the number of levels is something you can learn as well, so you have a kind of generic recipe throughout. 
(4) Successes of deep learning, beyond object recognition in images. 
Of course, besides object recognition in images, the big success has been in speech recognition. What most people don't know is that in the language modeling part of speech recognition, neural networks have been the SOTA for a number of years (starting with the work of Holger Schwenk), and neural nets (especially learning neural word embeddings, which I started with my NIPS'2000 paper) are quickly becoming a standard and a secret sauce in modern NLP. In particular, we are seeing this year a number of groups reaching and passing the SOTA in machine translation using these ideas (Google, BBN, Oxford, my lab, others). What is interesting is that we have moved beyond "object classification" into "structured output" tasks where the "output" is a high-dimensional object (e.g. a sentence, or a parse tree) which we are representing typically by its joint distribution. What is also interesting is that a lot of that work relies on UNSUPERVISED learning from pure text. See below for more on the supervised vs unsupervised divide. 
(5) Unsupervised learning is not bullshit and we can generalize with very few labeled examples thanks to transfer learning. Unsupervised learning is crucial to approach AI for a number of fundamental reasons, including the abundance of unlabeled data and the observed fact that representation learning (whether supervised or unsupervised) allows transfer learning, allowing to learn from VERY FEW LABELED EXAMPLES some new categories. Of course this is only possible if the learner has previously learned good representations from related categories, but with the AlexNet, it has clearly been shown that from 1000 object categories you can generalize to new categories with just a few examples. This has been demonstrated in many papers, starting with the two transfer learning competitions we won in 2011 (ICML 2011 and NIPS 2011), using UNSUPERVISED transfer learning. More recently, Socher showed that you can even get some decent generalization from ZERO examples simply because you know things from multiple modalities (e.g., that 'dog' and 'cat' are semantically similar in sentences, so that you can guess that something in an image could be a dog even if you have only seen images of cats). We had shown a similar result earlier (AAAI-2008, Larochelle et al) in the case of learning a new task for which you have some kind of representation (and these representations are learned across tasks).
So you can't use deep learning on a new field for which there is very little data if there is no relationship with what the learner has learned previously, but that is also true of humans. 
(6) Deep learning is not magic. 
I am horrified at how people react to deep learning as if 
  • (a) it was something magical, or 
  • (b) blame it for not being magical and solving every problem. 
Come on, let's not feed the blind hype and stay in the realm of science... which concentrates on the question of UNDERSTANDING. However, the kind of understanding I and others are seeking is not about the physical world directly but about the LEARNING mechanisms. That is very different. That is the thing on which I seek insight, and that is what my papers seek to provide. 
(7) About performance bounds: 
Because we are doing machine learning, you won't be able to achieve performance bounds without making assumptions on your data generating distribution. No magic. The BIG difference between deep learning and classical non-parametric statistical machine learning is that we go beyond the SMOOTHNESS assumption and add other priors such as
  • the existence of these underlying generative factors (= distributed representations)
  • assuming that they are organized hierarchically by composition (= depth)
  • assuming that they are causes of the observed data (allows semi-supervised learning to work)
  • assuming that different tasks share different subsets of factors (allows multi-task learning and transfer learning to tasks with very few labeled examples)
  • assuming that the top-level factors are related in simple ways to each other (makes it possible to stick a simple classifier on top of unsupervised learning and get decent results, for example)
See more of a discussion of that in my 2013 PAMI review paper on representation learning.
I am also posting this answer on Nuit Blanche (where I saw the video), google+ and facebook. The new book (called 'Deep Learning') that I am writing should help make these answers even clearer. 
Thanks again for helping me clarify things that need to be made clearer.
-- Yoshua




Image Credit: NASA/JPL-Caltech 
This image was taken by Navcam: Right B (NAV_RIGHT_B) onboard NASA's Mars rover Curiosity on Sol 729 (2014-08-25 05:32:20 UTC). 
Full Resolution

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Tuesday, August 26, 2014

Advice to a Freshly Graduated Graduate Student

Yesterday, I asked Junjie where his code would eventually be available. Initially Junjie wanted me to post it on my google site since he did not have an internet presence. During our back and forth conversation, it became clear that letting others be the owner of one's intellectual work and accomplishments is not a good signal fresh graduates want to send to the rest of the world. 

If you are a graduate student who has put the work, your research production cannot be in the hand of others by:

  • having only a personal website located at your current university (next department head or IT support could switch off your visibility from the web in one single blink)
  • not sharing your preprint or published article because you think that signing off a copyright statement to a journal means that you have to hide your work production from your peers forever (check what you can do here: http://www.sherpa.ac.uk/romeo/ )
  • not sharing your code implementation: once the implementation is out, others will use it, you will become a giant, see Nobody Cares About You and Your Algorithm. At the very least, you will be able to go back to your graduate work easily six months from now.
  • not pitching your work on Nuit Blanche (provided it is relevant - please tell me how it is relevant -). I specifically try to avoid deeplinking to implementations so that people can land on the researcher's or his/her publication page.
Junjie just produced an independent web presence and has released his code for yesterday's implemention here:


I'll update yesterday's entry to link back to that page.

Monday, August 25, 2014

Turbo Compressed Sensing with Partial DFT Sensing Matrix - implementation -

Junjie just let me know of the following:



Dear Igor,
I'm a reader of your Nuit Blanche blog, and I benefit a lot from it.
Recently we wrote a paper called "Turbo compressed sensing with partial DFT sensing matrix", available at http://arxiv.org/abs/1408.3904​ .
In that paper, we developed a simple iterative signal recovery algorithm for partial DFT sensing matrix (or more generally sub-sampled row orthogonal matrix). As we know, the state evolution of AMP (approximate message passing) is developed for i.i.d matrix and may not be accurate for a partial DFT sensing matrix, especially when the sparsity level is high. In our paper, we developed a signal recovery algorithm and analyzed its state evolution. Our numerical experiments show that the state evolution matches well with simulation. What is exciting is that the state evolution of our approach is consistent with the theoretical prediction for partial DFT sensing matrix in the following paper:
A. Tulino, G. Caire, S. Verdu, and S. Shamai, “Support recovery with sparsely sampled free random matrices,” IEEE Trans. Inf. Theory, vol. 59, no. 7, pp. 4243–4271, July 2013.
I therefore view our method as an extension of AMP from i.i.d matrix to partial DFT matrix.

I hope you find our paper interesting. Any comment is welcome.
Best regards,

Junjie

In this letter, we propose a turbo compressed sensing algorithm with partial discrete Fourier transform (DFT) sensing matrices. Interestingly, the state evolution of the proposed algorithm is shown to be consistent with that derived using the replica method. Numerical results demonstrate that the proposed algorithm outperforms the well-known approximate message passing (AMP) algorithm when a partial DFT sensing matrix is involved.
The implementation is on Junjie's page:



Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.