Training deep neural networks results in strong learned representations that show good generalization capabilities. In most cases, training involves iterative modification of all weights inside the network via back-propagation. In this paper, we propose to take an extreme approach and fix \emph{almost all weights} of a deep convolutional neural network in their randomly initialized values, allowing only a small portion to be learned. As our experiments show, this often results in performance which is on par with the performance of learning all weights. The implications of this intriguing property or deep neural networks are discussed and we suggest ways to harness it to create more robust representations.
Scortex deploys artificial intelligence in the heart of factories. We help our customers take the next big leap in smart automation thanks to our Quality Intelligence Solution. Our platform enables manufacturing companies to take control of their quality:
Automate visual inspection tasks
Monitor key quality data in real time through our intuitive platform
Improve production process by consolidating production knowledge.
Thanks to our proprietary deep learning platform, we provide a state-of-the-art performance and robust vision solution for quality intelligence. What you will do
As a proactive member of the machine learning and computer vision team, your work will include a varied range of challenges: explore various state of the art techniques to help solve tasks currently unbeaten by computers;
stay on the bleeding edge of research and participate actively in the community;
design, develop and implement supervised and unsupervised models with extremely constraining requirements not only on accuracy, but also on real-time execution, fast and scalable training processes and minimal annotation levels;
help improve our pipelines of data acquisition, training and inference.
What we are looking for
In-depth knowledge of deep learning techniques applied to computer vision: deep convolutional networks, autoencoders, image (pre)processing, regularization;
Proficient knowledge of both supervised and unsupervised machine learning techniques : clustering, object detection, generative models, dimensionality reduction;
Understanding of standard computer vision techniques : filtering, transformations, descriptors and detectors;
Knowledge and understanding of the mathematics underlying all of the above : probability and statistics, optimization, linear algebra, numerical computation;
Proven experience with at least one machine learning framework (bonus points for Keras or Tensorflow);
Research engineer position in the DREAM project at ISIR, Sorbonne-Université, Paris, France.
Job position available immediately and for 1 year (may be extended).
The DREAM European project (http://www.robotsthatdream.eu/) is focused on the bootstrap of a developmental process allowing a robot to learn about its environment and the objects it contains.
We are looking for highly motivated candidates with a strong experience in developing software for robotics, in particular on the ROS middleware. The recruited engineer will be in charge of the development and deployment of the ROS modules supporting the DREAM cognitive architecture. He/she will also help the partners to integrate their work into the cognitive architecture and will work on the validation experiments of the project. Programming skills in modern C++ and python are expected. The position involves robotics experiments that will be done on Baxter, PR2 and Pepper robots. The position may be extended later on to more than one year.
The position is located in the Institute of Intelligent Systems and Robotics (ISIR, http://www.isir.upmc.fr), Paris, France. ISIR belongs to Sorbonne Université which is among the top ranked French universities (http://sorbonne-universite.fr/en).
Speaking or understanding french is not required.
To apply, please send a CV, letter of motivation (max 2 pages), and a list of three references via e-mail to stephane.doncieux@upmc.fr. Please put [DREAM engineer application] in the subject of the mail. Review of applicants will begin immediately, and will continue until the position is filled.
A number of recent papers have provided evidence that practical design questions about neural networks may be tackled theoretically by studying the behavior of random networks. However, until now the tools available for analyzing random neural networks have been relatively ad-hoc. In this work, we show that the distribution of pre-activations in random neural networks can be exactly mapped onto lattice models in statistical physics. We argue that several previous investigations of stochastic networks actually studied a particular factorial approximation to the full lattice model. For random linear networks and random rectified linear networks we show that the corresponding lattice models in the wide network limit may be systematically approximated by a Gaussian distribution with covariance between the layers of the network. In each case, the approximate distribution can be diagonalized by Fourier transformation. We show that this approximation accurately describes the results of numerical simulations of wide random neural networks. Finally, we demonstrate that in each case the large scale behavior of the random networks can be approximated by an effective field theory.
Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix Y^TY, Y=f(WX), where W is a random weight matrix, X is a random data matrix, and f is a pointwise nonlinear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of single-layer random feature methods on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties.
Understanding the geometry of neural network loss surfaces is important for the development of improved optimization algorithms and for building a theoretical understanding of why deep learning works. In this paper, we study the geometry in terms of the distribution of eigenvalues of the Hessian matrix at critical points of varying energy. We introduce an analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of this distribution under a set of simplifying assumptions. The shape of the spectrum depends strongly on the energy and another key parameter, $\phi $, which measures the ratio of parameters to data points. Our analysis predicts and numerical simulations support that for critical points of small index, the number of negative eigenvalues scales like the 3/2 power of the energy. We leave as an open problem an explanation for ur observation that, in the context of a certain memorization task, the energy of minimizers is well-approximated by the function 1/2(1−ϕ)21/2(1−ϕ)2.
A deep fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP) in the limit of infinite network width. This correspondence enables exact Bayesian inference for neural networks on regression tasks by means of straightforward matrix computations. For single hidden-layer networks, the covariance function of this GP has long been known. Recently, kernel functions for multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified the correspondence between using these kernels as the covariance function for a GP and performing fully Bayesian prediction with a deep neural network. In this work, we derive this correspondence and develop a computationally efficient pipeline to compute the covariance functions. We then use the resulting GP to perform Bayesian inference for deep neural networks on MNIST and CIFAR-10. We find that the GP-based predictions are competitive and can outperform neural networks trained with stochastic gradient descent. We observe that the trained neural network accuracy approaches that of the corresponding GP-based computation with increasing layer width, and that the GP uncertainty is strongly correlated with prediction error. We connect our observations to the recent development of signal propagation in random neural networks.
We study the sample complexity of learning neural networks, by providing new bounds on their Rademacher complexity assuming norm constraints on the parameter matrix of each layer. Compared to previous work, these complexity bounds have improved dependence on the network depth, and under some additional assumptions, are fully independent of the network size (both depth and width). These results are derived using some novel techniques, which may be of independent interest.
Stéphane just let me about this Postdoc opportunity:
Dear Igor,
...
I am also contacting you because my friend Juan-Pablo Ortega is offering a postdoc position at St Gallen University (Switzerland) on very exciting new trends in Machine Learning, based on Reservoir Computing. Do you think it could be appropriate to post a link to this job announcement on your blog Nuit Blanche (see pdf attached) ? Juan Pablo’s web page is here: http://juan-pablo-ortega.com/
In Compressive Sensing theory and its applications, quantization of signal measurements, as integrated into any realistic sensing model, impacts the quality of signal reconstruction. In fact, there even exist incompatible combinations of quantization functions (e.g., the 1-bit sign function) and sensing matrices (e.g., Bernoulli) that cannot lead to an arbitrarily low reconstruction error when the number of observations increases. This work shows that, for a scalar and uniform quantization, provided that a uniform random vector, or "random dithering", is added to the compressive measurements of a low-complexity signal (e.g., a sparse or compressible signal, or a low-rank matrix) before quantization, a large class of random matrix constructions known to respect the restricted isometry property (RIP) are made "compatible" with this quantizer. This compatibility is demonstrated by the existence of (at least) one signal reconstruction method, the "projected back projection" (PBP), whose reconstruction error is proved to decay when the number of quantized measurements increases. Despite the simplicity of PBP, which amounts to projecting the back projection of the compressive observations (obtained from their multiplication by the adjoint sensing matrix) onto the low-complexity set containing the observed signal, we also prove that given a RIP matrix and for a single realization of the dithering, this reconstruction error decay is also achievable uniformly for the sensing of all signals in the considered low-complexity set. We finally confirm empirically these observations in several sensing contexts involving sparse signals, low-rank matrices, and compressible signals, with various RIP matrix constructions such as sub-Gaussian random matrices and random partial Discrete Cosine Transform (DCT) matrices.
This is the announcement and call for participants of the workshop and advanced school "Statistical physics and machine learning back together" that will take place in Cargese, Corsica, France during August 20-31, 2018. Please forward this to your colleagues/students that may be interested.
The capacity of the Cargese amphitheatre is limited, due to this constraint participants will be selected from the applicants.
The main goal of this event is to gather the community of researchers working on questions that relate in some way statistical physics and high dimensional statistical inference and learning. The format will be several (~10) 3h introductory lectures, and about thrice as many invited talks.
The topics include:
Energy/loss landscapes in disordered systems, machine learning and inference problems
Computational and statistical thresholds and trade-offs
Theory of artificial multilayer neural networks
Rigorous approaches to spin glasses and related models of statistical inference
Parallels between optimisation algorithms and dynamics in physics
Vindicating the replica and cavity method rigorously
This is the announcement and call for participants of the workshop and advanced school "Statistical physics and machine learning back together" that will take place in Cargese, Corsica, France during August 20-31, 2018. Please forward this to your colleagues/students that may be interested.
The capacity of the Cargese amphitheatre is limited, due to this constraint participants will be selected from the applicants.
The main goal of this event is to gather the community of researchers working on questions that relate in some way statistical physics and high dimensional statistical inference and learning. The format will be several (~10) 3h introductory lectures, and about thrice as many invited talks.
The topics include:
Energy/loss landscapes in disordered systems, machine learning and inference problems
Computational and statistical thresholds and trade-offs
Theory of artificial multilayer neural networks
Rigorous approaches to spin glasses and related models of statistical inference
Parallels between optimisation algorithms and dynamics in physics
Vindicating the replica and cavity method rigorously
Recent work in the literature has shown experimentally that one can use the lower layers of a trained convolutional neural network (CNN) to model natural textures. More interestingly, it has also been experimentally shown that only one layer with random filters can also model textures although with less variability. In this paper we ask the question as to why one layer CNNs with random filters are so effective in generating textures? We theoretically show that one layer convolutional architectures (without a non-linearity) paired with the an energy function used in previous literature, can in fact preserve and modulate frequency coefficients in a manner so that random weights and pretrained weights will generate the same type of images. Based on the results of this analysis we question whether similar properties hold in the case where one uses one convolution layer with a non-linearity. We show that in the case of ReLu non-linearity there are situations where only one input will give the minimum possible energy whereas in the case of no nonlinearity, there are always infinite solutions that will give the minimum possible energy. Thus we can show that in certain situations adding a ReLu non-linearity generates less variable images.
Several recent works have empirically observed that Convolutional Neural Nets (CNNs) are (approximately) invertible. To understand this approximate invertibility phenomenon and how to leverage it more effectively, we focus on a theoretical explanation and develop a mathematical model of sparse signal recovery that is consistent with CNNs with random weights. We give an exact connection to a particular model of model-based compressive sensing (and its recovery algorithms) and random-weight CNNs. We show empirically that several learned networks are consistent with our mathematical analysis and then demonstrate that with such a simple theoretical framework, we can obtain reasonable re- construction results on real images. We also discuss gaps between our model assumptions and the CNN trained for classification in practical scenarios.
At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks.
When it comes to random projections and Deep Neural Networks, the paper following paper is intriguing:
In summary, applying random weights in the whole CNNDeCNN architecture, we can still capture the geometric positions and contours of the image. The shape reduction of feature maps takes responsibility for the randomness on the reconstructed images for higher layer representation due to the representation compression. And random weight DeCNN can reconstruct robust images if we have enough number of feature maps
We systematically study the deep representation of random weight CNN (convolutional neural network) using the DeCNN (deconvolutional neural network) architecture. We first fix the weights of an untrained CNN, and for each layer of its feature representation, we train a corresponding DeCNN to reconstruct the input image. As compared with the pre-trained CNN, the DeCNN trained on a random weight CNN can reconstruct images more quickly and accurately, no matter which type of random distribution for the CNN's weights. It reveals that every layer of the random CNN can retain photographically accurate information about the image. We then let the DeCNN be untrained, i.e. the overall CNN-DeCNN architecture uses only random weights. Strikingly, we can reconstruct all position information of the image for low layer representations but the colors change. For high layer representations, we can still capture the rough contours of the image. We also change the number of feature maps and the shape of the feature maps and gain more insight on the random function of the CNN-DeCNN structure. Our work reveals that the purely random CNN-DeCNN architecture substantially contributes to the geometric and photometric invariance due to the intrinsic symmetry and invertible structure, but it discards the colormetric information due to the random projection.
We have of bunch of recent rigorous results that might be of interest for the community. In order to obtain them, I developed together with Nicolas Macris a new adaptive interpolation method well designed for treating high-dimensironl Bayesian inference problems.
_In this article we present our method with application to random linear estimation/compressive sensing as well as to symmetric low-rank matrix factorization and tensor factorization.
_In this one, presented at allerton this year, we present a nice application of the method to the non-symmetric tensor factorization problem that was resisting until now. Moreover we exploit the structure in « layers » of the model, which might be an idea of independent interest.
_Finally, our main recent result is the application of the method for proving the statistical physics conjecture for the single-letter « replica formula » for the mutual information of generalized linear models. There we also rigorously derive the inference and generalization errors of a large class of single layer neural networks such as the perceptron.
In recent years important progress has been achieved towards proving the validity of the replica predictions for the (asymptotic) mutual information (or "free energy") in Bayesian inference problems. The proof techniques that have emerged appear to be quite general, despite they have been worked out on a case-by-case basis. Unfortunately, a common point between all these schemes is their relatively high level of technicality. We present a new proof scheme that is quite straightforward with respect to the previous ones. We call it the stochastic interpolation method because it can be seen as an extension of the interpolation method developped by Guerra and Toninelli in the context of spin glasses, with a trial "parameter" which becomes a stochastic process. In order to illustrate our method we show how to prove the replica formula for three non-trivial inference problems. The first one is symmetric rank-one matrix estimation (or factorisation), which is the simplest problem considered here and the one for which the method is presented in full details. Then we generalize to symmetric tensor estimation and random linear estimation. In addition, we show how to prove a tight lower bound for the mutual information of non-symmetric tensor estimation. We believe that the present method has a much wider range of applicability and also sheds new insights on the reasons for the validity of replica formulas in Bayesian inference.
We consider rank-one non-symmetric tensor esti- mation and derive simple formulas for the mutual information. We start by the order 2 problem, namely matrix factorization. We treat it completely in a simpler fashion than previous proofs using a new type of interpolation method developed in [1]. We then show how to harness the structure in "layers" of tensor estimation in order to obtain a formula for the mutual information for the order 3 problem from the knowledge of the formula for the order 2 problem, still using the same kind of interpolation. Our proof technique straightforwardly general- izes and allows to rigorously obtain the mutual information at any order in a recursive way.
We consider generalized linear models (GLMs) where an unknown n-dimensional signal vector is observed through the application of a random matrix and a non-linear (possibly probabilistic) componentwise output function. We consider the models in the high-dimensional limit, where the observation consists of m points, and m/n→α where α stays finite in the limit m,n→∞. This situation is ubiquitous in applications ranging from supervised machine learning to signal processing. A substantial amount of theoretical work analyzed the model-case when the observation matrix has i.i.d. elements and the components of the ground-truth signal are taken independently from some known distribution. While statistical physics provided number of explicit conjectures for special cases of this model, results existing for non-linear output functions were so far non-rigorous. At the same time GLMs with non-linear output functions are used as a basic building block of powerful multilayer feedforward neural networks. Therefore rigorously establishing the formulas conjectured for the mutual information is a key open problem that we solve in this paper. We also provide an explicit asymptotic formula for the optimal generalization error, and confirm the prediction of phase transitions in GLMs. Analyzing the resulting formulas for several non-linear output functions, including the rectified linear unit or modulus functions, we obtain quantitative descriptions of information-theoretic limitations of high-dimensional inference. Our proof technique relies on a new version of the interpolation method with an adaptive interpolation path and is of independent interest. Furthermore we show that a polynomial-time algorithm referred to as generalized approximate message-passing reaches the optimal generalization error for a large set of parameters.
"It is commonly believed that individuals must provide a copy of their personal information in order for AI to train or predict over it. This belief creates a tension between developers and consumers. Developers want the ability to create innovative products and services, while consumers want to avoid sending developers a copy of their data.
With OpenMined, AI can be trained on data that it never has access to.
The mission of the OpenMined community is to make privacy-preserving deep learning technology accessible to consumers, who supply data, and machine learning practitioners, who train models on that data. Given recent developments in cryptography, AI-based products and services do not need a copy of a dataset in order to create value from it."
HACKATHON On Saturday, January 13th, the OpenMined community will be gathering in-person in over 20 cities around the world to collaborate on various coding projects and challenges. We’ll have a worldwide video hangout for all who cannot make it to a physical location. The hackathon will include three coding projects, each with a live tutorial from a member of the OpenMined community.
Here are the general details:
OpenMined Hackathon DetailsDate: January 13, 2018 On Saturday, January 13th, the OpenMined community will be gathering in-person in over 20 cities around the world to collaborate on various coding projects and challenges. We’ll have a world-wide video hangout for all who cannot make it to a physical location. The hackathon will include three coding projects, each with a live tutorial from a member of the OpenMined community. While hackathons will start at the discretion of each city’s organizer (slack them for details), code tutorials will be live broadcasted at 3 different times: 12:00 noon London time, 12:00 noon Eastern time, and 12:00 noon Pacific Time.
Coding Projects Beginner: Build a Neural Network in OpenMinedPresentation: How to use the OpenMined Keras InterfaceProject: Find a new dataset and train a new neural network using the Keras interface! Intermediate: Building the Guts of a Deep Learning FrameworkPresentation: How OpenMined Tensors Work - The Magic Under the HoodProject: Add a feature to Float Tensors Advanced: Performance Improvements - GPUs and NetworkingPresentation: Optimizing the key bottlenecks of the systemProject: The Need for Speed - Picking a Neural Network and Making it Faster
Physical Locations Participants in this hackathon will meet in person at the following locations. If your city says “venue tbd”, reach out to the Slack Point of Contact for specific details and directions. Starbucks is the suggested backup venue of choice - usually has fast wifi and big tables available. (If you aren’t on our Slack, click here for an invite Before you come... you need to do the following
In the OpenMined Slack, there is a #paris channel, the hackaton will be hosted at La Paillasse thanks to support from LightOn. You can find all the details in the #paris channel on the OpenMined Slack.