All the papers at
ICML are here. You can also follow
Hugo Larochelle who streams some talks through periscope.
All ICML
tutorials can be found here: They include the following:
Causal Inference for Policy Evaluation,
Susan Athey but also there:
A major goal of artificial
intelligence is to create general-purpose agents that can perform
effectively in a wide range of challenging tasks. To achieve this goal,
it is necessary to combine reinforcement learning (RL) agents with
powerful and flexible representations. The key idea of deep RL is to use
neural networks to provide this representational power. In this
tutorial we will present a family of algorithms in which deep neural
networks are used for value functions, policies, or environment models.
State-of-the-art results will be presented in a variety of domains,
including Atari games, 3D navigation tasks, continuous control domains
and the game of Go.
[slides1] [slides2]
There has been a recent resurgence in
interest in the use of the combination of reasoning, attention
and memory for solving tasks, particularly in the field of language
understanding. I will review some of these recent efforts, as well as
focusing on one of my own group’s contributions, memory networks,
an architecture that we have applied to question answering, language
modeling and general dialog. As we try to move towards the goal of true
language understanding, I will also discuss recent datasets and tests
that have been built to assess these models abilities to see how far we
have come.
Kaiming He (Facebook, starting July, 2016)
Deeper neural
networks are more difficult to train. Beyond a certain depth,
traditional deeper networks start to show severe underfitting caused by
optimization difficulties. This tutorial will describe the recently
developed residual learning framework, which eases the training of
networks that are substantially deeper than those used previously. These
residual networks are easier to converge, and can gain accuracy from
considerably increased depth. On the ImageNet dataset we evaluate
residual nets with depth of up to 152 layers—8x deeper than VGG nets but
still having lower complexity. These deep residual networks are the
foundations of our 1st-place winning entries in all five main tracks in
ImageNet and COCO 2015 competitions, which cover image classification,
object detection, and semantic segmentation.
In
this tutorial we will further look into the propagation formulations of
residual networks. Our latest work reveals that when the residual
networks have identity mappings as skip connections and inter-block
activations, the forward and backward signals can be directly propagated
from one block to any other block. This leads us to promising results
of 1001-layer residual networks. Our work suggests that there is much
room to exploit the dimension of network depth, a key to the success of
modern deep learning.
[slides]
Anima Anandkumar (University of California Irvine)
Most machine learning tasks require solving non-convex optimization. The number of critical points in a non-convex problem grows exponentially with the data dimension. Local search methods such as gradient descent can get stuck in one of these critical points, and therefore, finding the globally optimal solution is computationally hard. Despite this hardness barrier, we have seen many advances in guaranteed non-convex optimization. The focus has shifted to characterizing transparent conditions under which the global solution can be found efficiently. In many instances, these conditions turn out to be mild and natural for machine learning applications. This tutorial will provide an overview of the recent theoretical success stories in non-convex optimization. This includes learning latent variable models, dictionary learning, robust principal component analysis, and so on. Simple iterative methods such as spectral methods, alternating projections, and so on, are proven to learn consistent models with polynomial sample and computational complexity. This tutorial will present main ingredients towards establishing these results. The tutorial with conclude with open challenges and possible path towards tackling them.
This tutorial provides an accessible
introduction to the mathematical properties of stochastic gradient
methods and their consequences for large scale machine learning. After
reviewing the computational needs for solving optimization problems in
two typical examples of large scale machine learning, namely, the
training of sparse linear classifiers and deep neural networks, we
present the theory of the simple, yet versatile stochastic gradient
algorithm, explain its theoretical and practical behavior, and expose
the opportunities available for designing improved algorithms. We then
provide specific examples of advanced algorithms to illustrate the two
essential directions for improving stochastic gradient methods, namely,
managing the noise and making use of second order information.
[slides1] [slides2] [slides3]
Elad Hazan (Princeton University) and Satyen Kale (Yahoo Research)
In recent
years convex optimization and the notion of regret minimization in games
have been combined and applied to machine learning in a general
framework called online convex optimization. We will survey the basics
of this framework, its applications, main algorithmic techniques and
future research directions.
Reliable tools for inference and
model selection are necessary in all applications of machine learning
and statistics. Much of the existing theory breaks down in the now
common situation where the data analyst works interactively with the
data, adaptively choosing which methods to use by probing the same data
many times. We illustrate the problem through the lens of machine
learning benchmarks, which currently all rely on the standard holdout
method. After understanding why and when the standard holdout method
fails, we will see practical alternatives to the holdout method that can
be used many times without losing the guarantees of fresh data. We then
transition into the emerging theory on this topic touching on deep
connections to differential privacy, compression schemes, and hypothesis
testing (although no prior knowledge will be assumed).
Sudipto Guha (University of Pennsylvania) and Andrew McGregor (University of Massachusetts Amherst)
Graphs ae one of the most commonly
used data representation tools but existing algorithmicapproaches are
typically not appropriate when the graphs of interest are dynamic,
stochastic, ordo not fit into the memory of a single machine. Such graphs
are often encountered as machinelearning techniques are increasingly
deployed to manage graph data and large-scale graph opti-mization
problems. Graph sketching is a form of dimensionality reduction for
graph data that isbased on using random linear projections and
exploiting connections between linear algebra andcombinatorial
structure. The technique has been studied extensively over the last five
years andcan be applied in many computational settings. It enables
small-space online and data streamcomputation where we are permitted
only a few passes (ideally only one) over an input sequence ofupdates to
a large underlying graph. The technique parallelizes easily and can
naturally be appliedin various distributed settings. It can also be used
in the context of convex programming to enablemore efficient algorithms
for combinatorial optimization problems such as correlation clustering.
One of the main goals of the research on graph sketching is
understanding and characterizing thetypes of graph structure and
features that can be inferred from compressed representations of the
relevant graphs.
[slides1] [slides2]
In many fields such as healthcare,
education, and economics, policy makers have increasing amounts of data
at their disposal. Making policy decisions based on this data often
involves causal questions: Does medication X lead to lower blood sugar,
compared with medication Y? Does longer maternity leave lead to better
child social and cognitive skills? These questions have to be addressed
in practice, every day, by scientists working across many different
disciplines.
The goal of this tutorial is to bring
machine learning practitioners closer to the vast field of causal
inference as practiced by statisticians, epidemiologists and economists.
We believe that machine learning has much to contribute in helping
answer such questions, especially given the massive growth in the
available data and its complexity. We also believe the machine learning
community could and should be highly interested in engaging with such
problems, considering the great impact they have on society in general.
We hope that participants in the
tutorial will: a) learn the basic language of causal inference as
exemplified by the two most dominant paradigms today: the potential
outcomes framework, and causal graphs; b) understand the similarities
and the differences between problems machine learning practitioners
usually face and problems of causal inference; c) become familiar with
the basic tools employed by practicing scientists performing causal
inference, and d) be informed about the latest research efforts in
bringing machine learning techniques to address problems of causal
inference.
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.