The Deep Learning Summer School (2016 edition) just ended, here are some of the slides of the plenary speakers:
Doina Precup from McGill University
We provide a general introduction to machine learning, aimed to put all participants on the same page in terms of definitions and basic background. After a brief overview of different machine learning problems, we discuss linear regression, its objective function and closed-form solution. We discuss the bias-variance trade-off and the issue of overfitting (and the proper use of cross-validation to measure performance objectively). We discuss the probabilistic view of the sum-squared error as maximizing likelihood under specific assumptions on the data generation process, and present L2 and L1 regularization methods as priors from a Bayesian perspective. We briefly discuss Bayesian methodology for learning. Finally, we present logistic regression, the cross-entropy optimization criterion and its solution through first- and second-order methods.
Hugo Larochelle from Twitter and Université de Sherbrooke
In this lecture, I will cover the basic concepts behind feedforward neural networks. The talk will be split into 2 parts. In the first part, I'll cover forward propagation and backpropagation in neural networks. Specifically, I'll discuss the parameterization of feedforward nets, the most common types of units, the capacity of neural networks and how to compute the gradients of the training loss for classification with neural networks. In the second part, I'll discuss the final components necessary to train neural networks by gradient descent and then discuss the more recent ideas that are now commonly used for training deep neural networks. I will thus present different variants of gradient descent algorithms, dropout, batch normalization and unsupervised pretraining.
Pascal Lamblin from Université de Montréal
Introduction to Theano (Theano I & Practical Session)
Rob Fergus from New York University
Convolutional Neural Networks and Computer Vision
This talk will review Convolutional Neural Network models and the tremendous impact they have made on Computer Vision problems in the last few years.
Antonio Torralba from Massachusetts Institute of Technology
Learning to See
It is an exciting time for computer vision. With the success of new computational architectures for visual processing, such as deep neural networks (e.g., convNets) and access to image databases with millions of labeled examples (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly. Computer vision is now present among many commercial products, such as digital cameras, web applications, security applications, etc.
The performances achieved by convNets are remarkable and constitute the state of the art on many recognition tasks. But why it works so well? what is the nature of the internal representation learned by the network? I will show that the internal representation can be interpretable. In particular, object detectors emerge in a scene classification task. Then, I will show that an ambient audio signal can be used as a supervisory signal for learning visual representations. We do this by taking advantage of the fact that vision and hearing often tell us about similar structures in the world, such as when we see an object and simultaneously hear it make a sound. We train a convNet to predict ambient sound from video frames, and we show that, through this process, the model learns a visual representation that conveys significant information about objects and scenes.
Alex Wiltschko from Twitter
Introduction to Torch (Torch I & Practical Session)
Torch is an open platform for scientific computing in the Lua language, with a focus on machine learning, in particular deep learning. Torch is distinguished from other array libraries by having first-class support for GPU computation, and a clear, interactive and imperative style. Further, through the "NN" library, Torch has broad support for building and training neural networks by composing primitive blocks or layers together in compute graphs. Torch, although benefitting from extensive industry support, is a community owned and community developed ecosystem.
All neural net libraries, including Torch NN, rely on automatic differentiation (AD) to manage the computation of gradients of complex compositions of functions. I will also present some general background on automatic differentiation (AD), which is the fundamental abstraction of gradient-based optimization, and demonstrate Twitter's flexible implementation of AD in the library torch-autograd.
Yoshua Bengio from Université de Montréal
Recurrent Neural Networks
This lecture will cover recurrent neural networks, the key ingredient in the deep learning toolbox for handling sequential computation and modelling sequences. It will start by explaining how gradients can be computed (by considering the time-unfolded graph) and how different architectures can be designed to summarize a sequence, generate a sequence by ancestral sampling in a fully-observed directed model, or learn to map a vector to a sequence, a sequence to a sequence (of the same or different length) or a sequence to a vector. The issue of long-term dependencies, why it arises, and what has been proposed to alleviate it will be core subject of the discussion in this lecture. This includes changes in the architecture and initialization, as well as how to properly characterize the architecture in terms of recurrent or feedforward depth and its ability to create shortcuts or fast propagation of gradients in the unfolded graph. Open questions regarding the limitations of training by maximum likelihood (teacher forcing) and ideas towards towards making learning online (not requiring backprop through time) will also be discussed.
Sumit Chopra from Facebook
Reasoning, Attention and Memory
The machine learning community has had great success in the last decades at solving basic prediction tasks such as text classification, image annotation and speech recognition. However, solutions to deeper reasoning tasks have remained elusive. A key component towards achieving deeper reasoning is the use of long term dependencies as well as short term context during inference. Until recently, most existing machine learning models have lacked an easy way to read and write to part of a (potentially very large) long-term memory component, and to combine this seamlessly with inference. To combine memory with reasoning, a model must learn how to access it, i.e. to perform *attention* over its memory.
Within the last year or so, there has been some notable progress in this area however. Models developing notions of attention have shown positive results on a number of real-world tasks such as machine translation and image captioning. There has also been a surge in building models of computation which explore differing forms of explicit storage. Towards that end, I’ll shed some light on a set of models that fall in this category. In particular, I’ll discuss the Memory Networks, and its application to a wide variety of tasks, such as, question answering based on simulated stories, cloze style question answering, and dialog modeling. I’ll also talk about their subsequently proposed variants, including, End2End Memory Networks and Key Value Memory Networks. In addition, I will also talk about Neural Turing Machines, and Stack Augmented Recurrent Neural Networks. Throughout the talk I’ll discuss the advantages and disadvantages of each of these models and their variants. I will conclude with a discussion on what is still lacking among these models and potential open problems.
Jeff Dean from Google
Large Scale Deep Learning with TensorFlow
The last few years have seen deep learning make significant advances in fields as diverse as speech recognition, image understanding, natural language understanding, translation, robotics, and healthcare. In this talk I'll describe some of the machine learning research done by the Google Brain team (often in collaboration with others at Google). As part of our research, we have built two systems, DistBelief, and TensorFlow, for training large-scale deep learning models on large datasets. I'll describe some of the distributed system techniques we use to scale training of such modelsbeyond single devices, as well describe some of the design decisions and implementation of TensorFlow system, which was open sourced in November, 2015.
Video that accompanies slide 218 is here
Kyunghyun Cho from New York University
Deep Natural Language Understanding
In this lecture, I start with a claim that natural language understanding can largely be approached as building a better language model and explain three widely-adopted approaches to language modelling. They are n-gram language modelling, feedforward neural language modelling and recurrent language modelling. As I develop from the traditional n-gram language model toward recurrent language model, I discuss the concepts of data sparsity and generalization via continuous space representations. I then continue on to the recent development of a novel paradigm in machine translation based on recurrent language modelling, often called neural machine translation. The lecture concludes with three new opportunities in natural language processing/understanding made possible by the introduction of continuous space representations in deep neural networks.
Edward Grefenstette from Google DeepMind
Beyond Seq2Seq with Augmented RNNs
Sequence to sequence models in their most basic form, following an encoder-decoder paradigm, compressively encode source sequence representations into a single vector representation and decode this representation into a target sequence. This lecture will discuss the problems with this compressive approach, some solutions involving attention and external differentiable memory, and issues faced by these extensions. Motivating examples from the field of natural language understanding will be provided throughout.
Julie Bernauer from NVIDIA
GPU programming for Deep Learning
Joelle Pineau from McGill University
Introduction to Reinforcement Learning
Pieter Abbeel from UC Berkeley
Deep Reinforcement Learning
Ruslan Salakhutdinov from Carnegie Mellon University
Learning Deep Generative Models
In this tutorial I will discuss mathematical basics of many popular deep generative models, including Restricted Boltzmann Machines (RBMs), Deep Boltzmann Machines (DBMs), Helmholtz Machines, Variational Autoencoders (VAE) and Importance Weighted Autoencoders (IWAE). I will further demonstrate that these models are capable of extracting meaningful representations from high-dimensional data with applications in visual object recognition, information retrieval, and natural language processing.
Shakir Mohamed from Google DeepmindBuilding Machines that Imagine and Reason: Principles and Applications of Deep Generative Models
Deep generative models provide a solution to the problem of unsupervised learning, in which a machine learning system is required to discover the structure hidden within unlabelled data streams. Because they are generative, such models can form a rich imagery the world in which they are used: an imagination that can harnessed to explore variations in data, to reason about the structure and behaviour of the world, and ultimately, for decision-making. This tutorial looks at how we can build machine learning systems with a capacity for imagination using deep generative models, the types of probabilistic reasoning that they make possible, and the ways in which they can be used for decision making and acting.
Deep generative models have widespread applications including those in density estimation, image denoising and in-painting, data compression, scene understanding, representation learning, 3D scene construction, semi-supervised classification, and hierarchical control, amongst many others. After exploring these applications, we'll sketch a landscape of generative models, drawing-out three groups of models: fully-observed models, transformation models, and latent variable models. Different models require different principles for inference and we'll explore the different options available. Different combinations of model and inference give rise to different algorithms, including auto-regressive distribution estimators, variational auto-encoders, and generative adversarial networks. Although we will emphasise deep generative models, and the latent-variable class in particular, the intention of the tutorial will be to explore the general principles, tools and tricks that can be used throughout machine learning. These reusable topics include Bayesian deep learning, variational approximations, memoryless and amortised inference, and stochastic gradient estimation. We'll end by highlighting the topics that were not discussed, and imagine the future of generative models.
Bruno Olshausen from UC Berkeley
Beyond inspiration: Five lessons from biology on building intelligent machines
The only known systems that exhibit truly intelligent, autonomous behavior are biological. If we wish to build machines that are capable of such behavior, then it makes sense to learn as much as we can about how these systems work. Inspiration is a good starting point, but real progress will require gaining a more solid understanding of the principles of information processing at work in nervous systems. Here I will focus on five areas of investigation that I believe will be especially fruitful: 1) the study of perception and cognition in tiny nervous systems such as wasps and jumping spiders, 2) developing good computational models of nonlinear signal integration in dendritic trees, 3) the use of sparse, overcomplete representations of sensory input, 4) understanding the computational role of feedback in neural systems, and 5) the use of active sensing systems for acquiring information about the world.
Surya Ganguli from Stanford University
Computational Neuroscience II and Deep Learning Theory
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.