Nuit Blanche: 01/01/2018

Monday, January 29, 2018

Intriguing Properties of Randomly Weighted Networks: Generalizing while Learning Next to Nothing

Intriguing indeed !

Intriguing Properties of Randomly Weighted Networks: Generalizing while Learning Next to Nothing by Amir Rosenfeld, John K. Tsotsos

Training deep neural networks results in strong learned representations that show good generalization capabilities. In most cases, training involves iterative modification of all weights inside the network via back-propagation. In this paper, we propose to take an extreme approach and fix \emph{almost all weights} of a deep convolutional neural network in their randomly initialized values, allowing only a small portion to be learned. As our experiments show, this often results in performance which is on par with the performance of learning all weights. The implications of this intriguing property or deep neural networks are discussed and we suggest ways to harness it to create more robust representations.

h/t Iacopo and Miles

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Job: Machine Learning and Computer Vision Engineer, Scortex, Paris, France

Christophe let me know of a job opportunity at Scortex.io, the announcement is here:

About

Scortex deploys artificial intelligence in the heart of factories.
We help our customers take the next big leap in smart automation thanks to our Quality Intelligence Solution.
Our platform enables manufacturing companies to take control of their quality:

Automate visual inspection tasks

Monitor key quality data in real time through our intuitive platform

Improve production process by consolidating production knowledge.

Thanks to our proprietary deep learning platform, we provide a state-of-the-art performance and robust vision solution for quality intelligence.
What you will do

As a proactive member of the machine learning and computer vision team, your work will include a varied range of challenges:
explore various state of the art techniques to help solve tasks currently unbeaten by computers;

stay on the bleeding edge of research and participate actively in the community;

design, develop and implement supervised and unsupervised models with extremely constraining requirements not only on accuracy, but also on real-time execution, fast and scalable training processes and minimal annotation levels;

help improve our pipelines of data acquisition, training and inference.

What we are looking for

In-depth knowledge of deep learning techniques applied to computer vision: deep convolutional networks, autoencoders, image (pre)processing, regularization;

Proficient knowledge of both supervised and unsupervised machine learning techniques : clustering, object detection, generative models, dimensionality reduction;

Understanding of standard computer vision techniques : filtering, transformations, descriptors and detectors;

Knowledge and understanding of the mathematics underlying all of the above : probability and statistics, optimization, linear algebra, numerical computation;

Proven experience with at least one machine learning framework (bonus points for Keras or Tensorflow);

Good programming and software engineering skills;

Experience with the unix environment.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Job: Research engineer position in the DREAM project at ISIR, Sorbonne-Université, Paris, France.

Natalia let me know of the following job position

Research engineer position in the DREAM project at ISIR, Sorbonne-Université, Paris, France.

Job position available immediately and for 1 year (may be extended).

The DREAM European project (http://www.robotsthatdream.eu/) is focused on the bootstrap of a developmental process allowing a robot to learn about its environment and the objects it contains.

We are looking for highly motivated candidates with a strong experience in developing software for robotics, in particular on the ROS middleware. The recruited engineer will be in charge of the development and deployment of the ROS modules supporting the DREAM cognitive architecture. He/she will also help the partners to integrate their work into the cognitive architecture and will work on the validation experiments of the project. Programming skills in modern C++ and python are expected. The position involves robotics experiments that will be done on Baxter, PR2 and Pepper robots. The position may be extended later on to more than one year.

The position is located in the Institute of Intelligent Systems and Robotics (ISIR, http://www.isir.upmc.fr), Paris, France. ISIR belongs to Sorbonne Université which is among the top ranked French universities (http://sorbonne-universite.fr/en).

Speaking or understanding french is not required.

To apply, please send a CV, letter of motivation (max 2 pages), and a list of three references via e-mail to stephane.doncieux@upmc.fr. Please put [DREAM engineer application] in the subject of the mail. Review of applicants will begin immediately, and will continue until the position is filled.

Best regards,

Stephane Doncieux

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Saturday, January 27, 2018

Saturday Morning Videos: Theories of Deep Learning: videos and slides

Laurent did a wonderful job of listing slides and videos at the same location, you should go to his page to check videos and slides of the STAT385 at Stanford taught by Dave Donoho in part and with some illustrous speakers. The link is here.

All the slides are here:

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Friday, January 26, 2018

Video and Slides: Understanding and Improving Deep Learning With Random Matrix Theory, Jeffrey Pennington

From the slides of Jeffrey Pennington entitled: Understanding and Improving Deep Learning With Random Matrix Theory and given in STAT385 at Stanford under the guidance of Dave Donoho:

Why Random Matrices
The initial weight configuration is random ○ Training may induce only low-rank perturbations around the random configuration

Ah, it's one of the first slides and I already like it !

Here are some of the recent work by Jeffrey:

A Correspondence Between Random Neural Networks and Statistical Field Theory by Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

A number of recent papers have provided evidence that practical design questions about neural networks may be tackled theoretically by studying the behavior of random networks. However, until now the tools available for analyzing random neural networks have been relatively ad-hoc. In this work, we show that the distribution of pre-activations in random neural networks can be exactly mapped onto lattice models in statistical physics. We argue that several previous investigations of stochastic networks actually studied a particular factorial approximation to the full lattice model. For random linear networks and random rectified linear networks we show that the corresponding lattice models in the wide network limit may be systematically approximated by a Gaussian distribution with covariance between the layers of the network. In each case, the approximate distribution can be diagonalized by Fourier transformation. We show that this approximation accurately describes the results of numerical simulations of wide random neural networks. Finally, we demonstrate that in each case the large scale behavior of the random networks can be approximated by an effective field theory.

Nonlinear random matrix theory for deep learning by Jeffrey Pennington, Pratik Worah

Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix Y^TY, Y=f(WX), where W is a random weight matrix, X is a random data matrix, and f is a pointwise nonlinear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of single-layer random feature methods on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties.

Geometry of neural network loss surfaces via random matrix theory by Jeffrey Pennington, Yasaman Bahri

Understanding the geometry of neural network loss surfaces is important for the development of improved optimization algorithms and for building a theoretical understanding of why deep learning works. In this paper, we study the geometry in terms of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy. We introduce an analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of this distribution under a set of simplifying assumptions. The shape of the spectrum depends strongly on the energy and another key parameter, $\phi $, which measures the ratio of parameters to data points. Our analysis predicts and numerical simulations support that for critical points of small index, the number of negative eigenvalues scales like the 3/2 power of the energy. We leave as an open problem an explanation for ur observation that, in the context of a certain memorization task, the energy of minimizers is well-approximated by the function 1/2(1−ϕ)21/2(1−ϕ)2.

Not strictly relatedd to random matrix theory:

Deep Neural Networks as Gaussian Processes by Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

A deep fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP) in the limit of infinite network width. This correspondence enables exact Bayesian inference for neural networks on regression tasks by means of straightforward matrix computations. For single hidden-layer networks, the covariance function of this GP has long been known. Recently, kernel functions for multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified the correspondence between using these kernels as the covariance function for a GP and performing fully Bayesian prediction with a deep neural network. In this work, we derive this correspondence and develop a computationally efficient pipeline to compute the covariance functions. We then use the resulting GP to perform Bayesian inference for deep neural networks on MNIST and CIFAR-10. We find that the GP-based predictions are competitive and can outperform neural networks trained with stochastic gradient descent. We observe that the trained neural network accuracy approaches that of the corresponding GP-based computation with increasing layer width, and that the GP uncertainty is strongly correlated with prediction error. We connect our observations to the recent development of signal propagation in random neural networks.

h/t Laurent

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Thursday, January 25, 2018

Size-Independent Sample Complexity of Neural Networks

Ali (who recently wrote about the parallel between neural networks and optics) mentioned the following work entitled MythBusters: A Deep Learning Edition by Sasha Rakhlin on his twitter:

Here is the attendant work from Myth 3: Size-Independent Sample Complexity of Neural Networks by Noah Golowich, Alexander Rakhlin, Ohad Shamir

We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer. Compared to previous work, these complexity bounds have improved dependence on the network depth, and under some additional assumptions, are fully independent of the network size (both depth and width). These results are derived using some novel techniques, which may be of independent interest.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Tuesday, January 23, 2018

CSjob: Postdoc for research in reservoir computing, University of St.Gallen, Switzerland

Stéphane just let me about this Postdoc opportunity:

Dear Igor,

...

I am also contacting you because my friend Juan-Pablo Ortega is offering a postdoc position at St Gallen University (Switzerland) on very exciting new trends in Machine Learning, based on Reservoir Computing. Do you think it could be appropriate to post a link to this job announcement on your blog Nuit Blanche (see pdf attached) ? Juan Pablo’s web page is here: http://juan-pablo-ortega.com/

All the best,

Stephane

National Physical Laboratory

Data Science Division

Sure Stephane, here is the announcement:

At the Faculty of Mathematics and Statistics of the University of St. Gallen a new ASSISTANT / POSTDOCTORAL POSITION for research in reservoir computing will be filled starting April 1st , 2018 or later.
This position is funded by the Swiss National Science Foundation project entitled “Novel Ar chitectures for Photonic Reservoir Computing”. Research will be conducted under the su pervision of Prof. Juan-Pablo Ortega (see http://juan-pablo-ortega.com for an overview of his current research agenda) and in collaboration with the photonics group of the IBM Research labs in Zurich. The tentative duration of the contract is 30 months. The successful candidate will become part of a young research team, with a strong interest for the interplay between dynamical systems, machine learning, and statistical modeling, as well as for applications of those techniques to financial econometrics and physiological signal treatment. The group is located at the Faculty of Mathematics and Statistics of the University of St.Gallen (http://www.mathstat.unisg.ch).
Applications for the position should be sent by e-mail to Mrs. Margit Albers (math- stat@unisg.ch) no later than March 1st, 2018.
Candidate Profile: - Ph.D. in a strongly quantitative subject: Mathematics, Computer Science, Statistics, Physics, Engineering... - Strong background in dynamical systems and in both deterministic and statistical modeling - Interest for machine learning and optimization - Knowledge in financial econometrics and/or signal treatment is a plus - Good programming skills (Matlab, R, Python, C,...) are required The application package must contain: - Motivation letter - Complete Curriculum vitae - Two recommendation letters Duties: - Research activity in the context of the project “Novel Architectures for Photonic Reservoir Computing”. - Some teaching support at the Faculty of Mathematics and Statistics at various levels.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Monday, January 22, 2018

Quantized Compressive Sensing with RIP Matrices: The Benefit of Dithering

Quantized Compressive Sensing with RIP Matrices: The Benefit of Dithering by Chunlei Xu, Laurent Jacques

In Compressive Sensing theory and its applications, quantization of signal measurements, as integrated into any realistic sensing model, impacts the quality of signal reconstruction. In fact, there even exist incompatible combinations of quantization functions (e.g., the 1-bit sign function) and sensing matrices (e.g., Bernoulli) that cannot lead to an arbitrarily low reconstruction error when the number of observations increases.
This work shows that, for a scalar and uniform quantization, provided that a uniform random vector, or "random dithering", is added to the compressive measurements of a low-complexity signal (e.g., a sparse or compressible signal, or a low-rank matrix) before quantization, a large class of random matrix constructions known to respect the restricted isometry property (RIP) are made "compatible" with this quantizer. This compatibility is demonstrated by the existence of (at least) one signal reconstruction method, the "projected back projection" (PBP), whose reconstruction error is proved to decay when the number of quantized measurements increases.
Despite the simplicity of PBP, which amounts to projecting the back projection of the compressive observations (obtained from their multiplication by the adjoint sensing matrix) onto the low-complexity set containing the observed signal, we also prove that given a RIP matrix and for a single realization of the dithering, this reconstruction error decay is also achievable uniformly for the sensing of all signals in the considered low-complexity set.
We finally confirm empirically these observations in several sensing contexts involving sparse signals, low-rank matrices, and compressible signals, with various RIP matrix constructions such as sub-Gaussian random matrices and random partial Discrete Cosine Transform (DCT) matrices.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Call for participants: Workshop and Advanced school "Statistical physics and machine learning back together" in Cargese, Corsica, France, August 20-31, 2018

Lenka sent me the following the other day:

Dear Colleagues and Friends,

This is the announcement and call for participants of the workshop and advanced school "Statistical physics and machine learning back together" that will take place in Cargese, Corsica, France during August 20-31, 2018. Please forward this to your colleagues/students that may be interested.

Researchers, students and postdocs interested to participate in the event are invited to apply on the website http://cargese.krzakala.org
(or http://www.lps.ens.fr/~krzakala/WEBSITE_Cargese2018/home.htm ) by February 28st, 2018.

The capacity of the Cargese amphitheatre is limited, due to this constraint participants will be selected from the applicants.

The main goal of this event is to gather the community of researchers working on questions that relate in some way statistical physics and high dimensional statistical inference and learning. The format will be several (~10) 3h introductory lectures, and about thrice as many invited talks.

The topics include:

Energy/loss landscapes in disordered systems, machine learning and inference problems

Computational and statistical thresholds and trade-offs

Theory of artificial multilayer neural networks

Rigorous approaches to spin glasses and related models of statistical inference

Parallels between optimisation algorithms and dynamics in physics

Vindicating the replica and cavity method rigorously

Current trends in variational Bayes inference

Developments in message passing algorithms

Applications on machine learning in physics

Information processing in biological systems

Lecturers:

Gerard Ben Arous (Courant Institute)

Giulio Biroli (CEA Saclay, France)

Nicolas Brunel (Duke University)

Yann LeCun (Courant Institute and Facebook)

Michael Jordan (UC Berkeley)

Stephane Mallat (ENS et college de France)

Andrea Montanari (Stanford)

Dmitry Panchenko (University of Toronto, Canada)

Sundeep Rangan (New York University)

Riccardo Zecchina (Politecnico Turin, Italy)

Speakers:

Antonio C Auffinger (Northwestern University)

Afonso Bandeira (Courant Institute, NYU)

Jean Barbier (Queens Mary, London)

Quentin Berthet (Cambridge UK)

Jean-Philippe Bouchaud (CFM, Paris)

Joan Bruna (Courant Institute, NYU)

Patrick Charbonneau (Duke)

Amir Dembo (Stanford)

Allie Fletcher (UCLA)

Silvio Franz (Paris-Orsay)

Surya Ganguli (Stanford)

Alice Guionnet (ENS Lyon)

Aukosh Jagganath (Harvard)

Yoshiyuki Kabashima (Tokyo Tech)

Christina Lee (MIT)

Marc Lelarge (ENS, Paris)

Tengyu Ma (Princeton)

Marc Mezard (ENS, Paris)

Leo Miolane (ENS, Paris)

Remi Monasson (ENS, Paris)

Cristopher Moore (Santa Fe Institute)

Giorgio Parisi (Roma La Sapienza)

Will Perkins (Birmingham)

Federico Ricci-Tersenghi (Roma La Sapienza)

Cindy Rush (Columbia Univ.)

Levent Sagun (CEA Saclay)

S. S. Schoenholz (Google Brain)

Phil Schniter (Ohio State University)

David Jason Schwab (Northwestern University)

Guilhem Semerjian (ENS, Paris)

Alexandre Tkatchenko (University of Luxembourg)

Naftali Tishby (Hebrew University)

Pierfrancesco Urbani (CNRS, Paris)

Francesco Zamponi (ENS, Paris)

With best regards the organizers

Florent Krzakala and Lenka Zdeborova

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Call for participants: Workshop and Advanced school "Statistical physics and machine learning back together" in Cargese, Corsica, France, August 20-31, 2018

Lenka sent me the following the other day:

Dear Colleagues and Friends,

This is the announcement and call for participants of the workshop and advanced school "Statistical physics and machine learning back together" that will take place in Cargese, Corsica, France during August 20-31, 2018. Please forward this to your colleagues/students that may be interested.

Researchers, students and postdocs interested to participate in the event are invited to apply on the website http://cargese.krzakala.org
(or http://www.lps.ens.fr/~krzakala/WEBSITE_Cargese2018/home.htm ) by February 28st, 2018.

The capacity of the Cargese amphitheatre is limited, due to this constraint participants will be selected from the applicants.

The main goal of this event is to gather the community of researchers working on questions that relate in some way statistical physics and high dimensional statistical inference and learning. The format will be several (~10) 3h introductory lectures, and about thrice as many invited talks.

The topics include:

Energy/loss landscapes in disordered systems, machine learning and inference problems

Computational and statistical thresholds and trade-offs

Theory of artificial multilayer neural networks

Rigorous approaches to spin glasses and related models of statistical inference

Parallels between optimisation algorithms and dynamics in physics

Vindicating the replica and cavity method rigorously

Current trends in variational Bayes inference

Developments in message passing algorithms

Applications on machine learning in physics

Information processing in biological systems

Lecturers:

Gerard Ben Arous (Courant Institute)

Giulio Biroli (CEA Saclay, France)

Nicolas Brunel (Duke University)

Yann LeCun (Courant Institute and Facebook)

Michael Jordan (UC Berkeley)

Stephane Mallat (ENS et college de France)

Andrea Montanari (Stanford)

Dmitry Panchenko (University of Toronto, Canada)

Sundeep Rangan (New York University)

Riccardo Zecchina (Politecnico Turin, Italy)

Speakers:

Antonio C Auffinger (Northwestern University)

Afonso Bandeira (Courant Institute, NYU)

Jean Barbier (Queens Mary, London)

Quentin Berthet (Cambridge UK)

Jean-Philippe Bouchaud (CFM, Paris)

Joan Bruna (Courant Institute, NYU)

Patrick Charbonneau (Duke)

Amir Dembo (Stanford)

Allie Fletcher (UCLA)

Silvio Franz (Paris-Orsay)

Surya Ganguli (Stanford)

Alice Guionnet (ENS Lyon)

Aukosh Jagganath (Harvard)

Yoshiyuki Kabashima (Tokyo Tech)

Christina Lee (MIT)

Marc Lelarge (ENS, Paris)

Tengyu Ma (Princeton)

Marc Mezard (ENS, Paris)

Leo Miolane (ENS, Paris)

Remi Monasson (ENS, Paris)

Cristopher Moore (Santa Fe Institute)

Giorgio Parisi (Roma La Sapienza)

Will Perkins (Birmingham)

Federico Ricci-Tersenghi (Roma La Sapienza)

Cindy Rush (Columbia Univ.)

Levent Sagun (CEA Saclay)

S. S. Schoenholz (Google Brain)

Phil Schniter (Ohio State University)

David Jason Schwab (Northwestern University)

Guilhem Semerjian (ENS, Paris)

Alexandre Tkatchenko (University of Luxembourg)

Naftali Tishby (Hebrew University)

Pierfrancesco Urbani (CNRS, Paris)

Francesco Zamponi (ENS, Paris)

With best regards the organizers

Florent Krzakala and Lenka Zdeborova

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Friday, January 19, 2018

On Random Weights for Texture Generation in One Layer Neural Networks

Continuing up on the use of random projections (which in the context of DNNs is really about NN with random weights), today we have:

On Random Weights for Texture Generation in One Layer Neural Networks by Mihir Mongia, Kundan Kumar, Akram Erraqabi, Yoshua Bengio

Recent work in the literature has shown experimentally that one can use the lower layers of a trained convolutional neural network (CNN) to model natural textures. More interestingly, it has also been experimentally shown that only one layer with random filters can also model textures although with less variability. In this paper we ask the question as to why one layer CNNs with random filters are so effective in generating textures? We theoretically show that one layer convolutional architectures (without a non-linearity) paired with the an energy function used in previous literature, can in fact preserve and modulate frequency coefficients in a manner so that random weights and pretrained weights will generate the same type of images. Based on the results of this analysis we question whether similar properties hold in the case where one uses one convolution layer with a non-linearity. We show that in the case of ReLu non-linearity there are situations where only one input will give the minimum possible energy whereas in the case of no nonlinearity, there are always infinite solutions that will give the minimum possible energy. Thus we can show that in certain situations adding a ReLu non-linearity generates less variable images.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Thursday, January 18, 2018

Towards Understanding the Invertibility of Convolutional Neural Networks

Ok, here is a "connection to a particular model of model-based compressive sensing (and its recovery algorithms) and random-weight CNNs". This is great! I would have expected to see the LISTA paper from Gregor and Lecun in there somewhere. Irrespective, this type of analysis brings us closer to figuring out the sort of layer that keeps or doesn't keep information (see Sunday Morning Insight: Sharp Phase Transitions in Machine Learning ? ). Enjoy !

Towards Understanding the Invertibility of Convolutional Neural Networks by Anna C. Gilbert, Yi Zhang, Kibok Lee, Yuting Zhang, Honglak Lee

Several recent works have empirically observed that Convolutional Neural Nets (CNNs) are (approximately) invertible. To understand this approximate invertibility phenomenon and how to leverage it more effectively, we focus on a theoretical explanation and develop a mathematical model of sparse signal recovery that is consistent with CNNs with random weights. We give an exact connection to a particular model of model-based compressive sensing (and its recovery algorithms) and random-weight CNNs. We show empirically that several learned networks are consistent with our mathematical analysis and then demonstrate that with such a simple theoretical framework, we can obtain reasonable re- construction results on real images. We also discuss gaps between our model assumptions and the CNN trained for classification in practical scenarios.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Wednesday, January 17, 2018

Deep Complex Networks - implementation -

Here is some interesting work on complex DNNs.

Deep Complex Networks by Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J Pal

At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks.

Implementation is here: https://github.com/ChihebTrabelsi/deep_complex_networks
The reviews are here. Also, Carlos Perez had a blog post on the matter a while ago: Should Deep Learning use Complex Numbers?

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Tuesday, January 16, 2018

Understanding Deep Representations through Random Weights

When it comes to random projections and Deep Neural Networks, the paper following paper is intriguing:

In summary, applying random weights in the whole CNNDeCNN architecture, we can still capture the geometric positions and contours of the image. The shape reduction of feature maps takes responsibility for the randomness on the reconstructed images for higher layer representation due to the representation compression. And random weight DeCNN can reconstruct robust images if we have enough number of feature maps

Understanding Deep Representations through Random Weights by Yao Shu, Man Zhu, Kun He, John Hopcroft, Pan Zhou

We systematically study the deep representation of random weight CNN (convolutional neural network) using the DeCNN (deconvolutional neural network) architecture. We first fix the weights of an untrained CNN, and for each layer of its feature representation, we train a corresponding DeCNN to reconstruct the input image. As compared with the pre-trained CNN, the DeCNN trained on a random weight CNN can reconstruct images more quickly and accurately, no matter which type of random distribution for the CNN's weights. It reveals that every layer of the random CNN can retain photographically accurate information about the image. We then let the DeCNN be untrained, i.e. the overall CNN-DeCNN architecture uses only random weights. Strikingly, we can reconstruct all position information of the image for low layer representations but the colors change. For high layer representations, we can still capture the rough contours of the image. We also change the number of feature maps and the shape of the feature maps and gain more insight on the random function of the CNN-DeCNN structure. Our work reveals that the purely random CNN-DeCNN architecture substantially contributes to the geometric and photometric invariance due to the intrinsic symmetry and invertible structure, but it discards the colormetric information due to the random projection.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Monday, January 15, 2018

The stochastic interpolation method. The layered Structure of Tensor Estimation and Phase Transitions, Optimal Errors and Optimality of Message-Passing in GLMs

Jean just sent me the following:

Dear Igor,

We have of bunch of recent rigorous results that might be of interest for the community. In order to obtain them, I developed together with Nicolas Macris a new adaptive interpolation method well designed for treating high-dimensironl Bayesian inference problems.

_In this article we present our method with application to random linear estimation/compressive sensing as well as to symmetric low-rank matrix factorization and tensor factorization.

https://arxiv.org/pdf/1705.02780.pdf

_In this one, presented at allerton this year, we present a nice application of the method to the non-symmetric tensor factorization problem that was resisting until now. Moreover we exploit the structure in « layers » of the model, which might be an idea of independent interest.

https://arxiv.org/pdf/1709.10368.pdf

_Finally, our main recent result is the application of the method for proving the statistical physics conjecture for the single-letter « replica formula » for the mutual information of generalized linear models. There we also rigorously derive the inference and generalization errors of a large class of single layer neural networks such as the perceptron.

https://arxiv.org/pdf/1708.03395.pdf

All the best

Thanks Jean, here are the papers you mentioned. I had mentioned one before and I like the layer approach of the second paper !

The stochastic interpolation method: A simple scheme to prove replica formulas in Bayesian inference by Jean Barbier, Nicolas Macris

In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relatively high level of technicality. We present a new proof scheme that is quite straightforward with respect to the previous ones. We call it the stochastic interpolation method because it can be seen as an extension of the interpolation method developped by Guerra and Toninelli in the context of spin glasses, with a trial "parameter" which becomes a stochastic process. In order to illustrate our method we show how to prove the replica formula for three non-trivial inference problems. The first one is symmetric rank-one matrix estimation (or factorisation), which is the simplest problem considered here and the one for which the method is presented in full details. Then we generalize to symmetric tensor estimation and random linear estimation. In addition, we show how to prove a tight lower bound for the mutual information of non-symmetric tensor estimation. We believe that the present method has a much wider range of applicability and also sheds new insights on the reasons for the validity of replica formulas in Bayesian inference.

The Layered Structure of Tensor Estimation and its Mutual Information by Jean Barbier, Nicolas Macris, Léo Miolane

We consider rank-one non-symmetric tensor esti- mation and derive simple formulas for the mutual information. We start by the order 2 problem, namely matrix factorization. We treat it completely in a simpler fashion than previous proofs using a new type of interpolation method developed in [1]. We then show how to harness the structure in "layers" of tensor estimation in order to obtain a formula for the mutual information for the order 3 problem from the knowledge of the formula for the order 2 problem, still using the same kind of interpolation. Our proof technique straightforwardly general- izes and allows to rigorously obtain the mutual information at any order in a recursive way.

Phase Transitions, Optimal Errors and Optimality of Message-Passing in Generalized Linear Models by Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, Lenka Zdeborová

We consider generalized linear models (GLMs) where an unknown n-dimensional signal vector is observed through the application of a random matrix and a non-linear (possibly probabilistic) componentwise output function. We consider the models in the high-dimensional limit, where the observation consists of m points, and m/n→α where α stays finite in the limit m,n→∞. This situation is ubiquitous in applications ranging from supervised machine learning to signal processing. A substantial amount of theoretical work analyzed the model-case when the observation matrix has i.i.d. elements and the components of the ground-truth signal are taken independently from some known distribution. While statistical physics provided number of explicit conjectures for special cases of this model, results existing for non-linear output functions were so far non-rigorous. At the same time GLMs with non-linear output functions are used as a basic building block of powerful multilayer feedforward neural networks. Therefore rigorously establishing the formulas conjectured for the mutual information is a key open problem that we solve in this paper. We also provide an explicit asymptotic formula for the optimal generalization error, and confirm the prediction of phase transitions in GLMs. Analyzing the resulting formulas for several non-linear output functions, including the rectified linear unit or modulus functions, we obtain quantitative descriptions of information-theoretic limitations of high-dimensional inference. Our proof technique relies on a new version of the interpolation method with an adaptive interpolation path and is of independent interest. Furthermore we show that a polynomial-time algorithm referred to as generalized approximate message-passing reaches the optimal generalization error for a large set of parameters.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !

Friday, January 12, 2018

OpenMined Hackathon in Paris (Saturday, January 13th)

I heard about the OpenMined project during the Paris Machine Learning meetup that we organized back in December from a presentation by Morten Dahl. OpenMined is a community focused on building open-source technology for the decentralized ownership of data and intelligence.

Their mission

"It is commonly believed that individuals must provide a copy of their personal information in order for AI to train or predict over it. This belief creates a tension between developers and consumers. Developers want the ability to create innovative products and services, while consumers want to avoid sending developers a copy of their data.

With OpenMined, AI can be trained on data that it never has access to.

The mission of the OpenMined community is to make privacy-preserving deep learning technology accessible to consumers, who supply data, and machine learning practitioners, who train models on that data. Given recent developments in cryptography, AI-based products and services do not need a copy of a dataset in order to create value from it."

The community is organizing a hackathon

HACKATHON
On Saturday, January 13th, the OpenMined community will be gathering in-person in over 20 cities around the world to collaborate on various coding projects and challenges. We’ll have a worldwide video hangout for all who cannot make it to a physical location. The hackathon will include three coding projects, each with a live tutorial from a member of the OpenMined community.

Here are the general details:

OpenMined Hackathon DetailsDate: January 13, 2018

On Saturday, January 13th, the OpenMined community will be gathering in-person in over 20 cities around the world to collaborate on various coding projects and challenges. We’ll have a world-wide video hangout for all who cannot make it to a physical location. The hackathon will include three coding projects, each with a live tutorial from a member of the OpenMined community. While hackathons will start at the discretion of each city’s organizer (slack them for details), code tutorials will be live broadcasted at 3 different times: 12:00 noon London time, 12:00 noon Eastern time, and 12:00 noon Pacific Time.

Coding Projects
Beginner: Build a Neural Network in OpenMinedPresentation: How to use the OpenMined Keras InterfaceProject: Find a new dataset and train a new neural network using the Keras interface!
Intermediate: Building the Guts of a Deep Learning FrameworkPresentation: How OpenMined Tensors Work - The Magic Under the HoodProject: Add a feature to Float Tensors
Advanced: Performance Improvements - GPUs and NetworkingPresentation: Optimizing the key bottlenecks of the systemProject: The Need for Speed - Picking a Neural Network and Making it Faster

Physical Locations
Participants in this hackathon will meet in person at the following locations. If your city says “venue tbd”, reach out to the Slack Point of Contact for specific details and directions. Starbucks is the suggested backup venue of choice - usually has fast wifi and big tables available. (If you aren’t on our Slack, click here for an invite
Before you come...
you need to do the following

Join our Slack

- and join the #hackathon channel

Reach out to your city’s organizer on Slack!

Download and Install Unity

- https://unity3d.com/

Follow the Readme to setup OpenMined and PySyft available here: https://github.com/OpenMined/PySyft & https://github.com/OpenMined/OpenMined

OpenMined is a community focused on building technology for the decentralized ownership of data and intelligence.Join our Slack channel to get involved at https://openmined.org/Follow: https://twitter.com/openminedorgContribute: https://github.com/OpenMined/OpenMined

In the OpenMined Slack, there is a #paris channel, the hackaton will be hosted at La Paillasse thanks to support from LightOn. You can find all the details in the #paris channel on the OpenMined Slack.