By becoming so central in our daily lives, Machine Learning is facing a new kind of problem that previous areas of engineering used to leave to few specialists, namely is there a way to intepret models used in Machine Learning ? Can we understand the biases of these models ? Can we understand when they fail ? Are the rules it implements in line with the way we think these models should behave ? The Interpretable Machine Learning for Complex Systems workshop at NIPS2016 try to give a glimpse of our collective attempts at answering those very pressing issues. Very few engineering professions have had to face this societal issue directly. Thanks to the organizers: Andrew Gordon Wilson , Been Kim, William Herlands for organizing this workshop and making all the papers available. Here is a list of the papers presented mostly as posters (I have covered two before, here is one). I also note the lack of authors from French entities.
- arXiv:1611.07634 [pdf, ps, other]
- arXiv:1611.04887 [pdf, ps, other]
- arXiv:1611.07270 [pdf, ps, other]
- arXiv:1611.07663 [pdf, ps, other]
- arXiv:1611.06996 [pdf, ps, other]
Honglak Lee -- University of Michigan || Website
Title: Interpretable Deep Learning with Disentangled Representations
@9:00am
Over the recent years, deep learning has emerged as a powerful method for learning feature representations from complex input data, and it has been greatly successful in computer vision, speech recognition, and language modeling. While many deep learning algorithms focus on a discriminative task and extract only task-relevant features that are invariant to other factors, complex sensory data is often generated from intricate interaction between underlying factors of variations (for example, pose, morphology and viewpoints for 3d object images).
In
this work, we tackle the problem of learning deep representations that
disentangle underlying factors of variation and allow for complex visual
reasoning and inference, as well as better interpretability. We present
several successful instances of deep architectures and their learning
methods in supervised and weakly-supervised settings. Further, I will
talk about visual analogy making with disentangled representations, as
well as a connection between disentangling and unsupervised learning. In
the second part of the talk, I will describe my work on learning deep
representations from multiple heterogeneous input modalities, which
provides connections between disentangling, multimodal learning, joint
embedding, and conditional generation (e.g., generating images from
text descriptions). Finally, I will show how disentangled
representations can be useful for predicting future in temporal sequence
data (e.g., videos) and some reinforcement learning tasks.
Finale Doshi-Velez -- Harvard || Website
@9:30am
With a growing interest in interpretability, there is an increasing need to characterize what exactly we mean by it and how to sensibly compare the interpretability of different approaches. In this talk, I suggest that our current desire for "interpretability" is as vague as asking for "good predictions" -- a desire that. while entirely reasonable, must be formalized into concrete needs such as high average test performance (perhaps held-out likelihood is a good metric) or some kind of robust performance (perhaps sensitivity or specificity are more appropriate metrics). This objective of this talk is to start a conversation to do the same for interpretability: I will describe distinct, concrete objectives that all fall under the umbrella term of interpretability and how each objective suggests natural evaluation procedures. I will also describe highlight important open questions in the evaluation of interpretable models.
Joint work with Been Kim, and the product of discussions with countless collaborators and colleagues.
In machine learning often a tradeoff must be made between accuracy and intelligibility: the most accurate models usually are not very intelligible (e.g., deep neural nets, boosted trees, and random forests), and the most intelligible models usually are less accurate (e.g., linear/logistic regression). This tradeoff often limits the accuracy of models that can be applied in mission-critical applications such as healthcare where being able to understand, validate, edit, and ultimately trust a learned model is important. We have developed a learning method based on generalized additive models (GAMs) that is often as accurate as full complexity models, but remains as intelligible as linear/logistic regression models. In the talk I’ll present two case studies where these high-performance generalized additive models (GA2Ms) are applied to healthcare problems yielding intelligible models with state-of-the-art accuracy. In the pneumonia risk prediction case study, the intelligible model uncovers surprising patterns in the data that previously prevented complex learned models from being deployed, but because it is intelligible and modular allows these patterns to easily be recognized and removed. In the 30-day hospital readmission case study, we show that the same methods scale to large datasets containing hundreds of thousands of patients and thousands of attributes while remaining intelligible and providing accuracy comparable to the best (unintelligible) machine learning methods.
Maya Gupta -- Google || Website
Title: The Power of Monotonicity for Practical Machine Learning
@2:30pm
What prior knowledge do humans have about machine learning problems that we can take advantage of as regularizers? One common intuition is that certain inputs should have a positive (only) effect on the output, for example, the price of a house should only increase as its size goes up, if all else is the same. Incorporating such monotonic priors into our machine learning algorithms can dramatically increase their interpretability and debuggability. We'll discuss state-of-the-art algorithms to learn flexible monotonic functions, and share some stories about why monotonicity is such an important regularizer for practical problems where train and test samples are not IID, especially when learning from clicks.
Title: Finding interpretable sparse structure in fMRI data with dependent relevance determination priors
@3:30pm
In many problem settings, parameters are not merely sparse, but dependent in such a way that non-zero coefficients tend to cluster together. We refer to this form of dependency as region sparsity". Classical sparse regression methods, such as the lasso and automatic relevance determination (ARD), which models parameters as independent a priori, and therefore do not exploit such dependencies. Here we introduce a hierarchical model for smooth, region-sparse weight vectors and tensors in a linear regression setting. Our approach represents a hierarchical extension of the relevance determination framework, where we add a transformed Gaussian process to model the dependencies between the prior variances of regression weights. We combine this with a structured model of the prior variances of Fourier coefficients, which eliminates unnecessary high frequencies. The resulting prior encourages weights to be region-sparse in two different bases simultaneously. We develop Laplace approximation and Monte Carlo Markov Chain (MCMC) sampling to provide efficient inference for the posterior, and show substantial improvements over existing methods for both simulated and real fMRI datasets.
@3:30pm
In many problem settings, parameters are not merely sparse, but dependent in such a way that non-zero coefficients tend to cluster together. We refer to this form of dependency as region sparsity". Classical sparse regression methods, such as the lasso and automatic relevance determination (ARD), which models parameters as independent a priori, and therefore do not exploit such dependencies. Here we introduce a hierarchical model for smooth, region-sparse weight vectors and tensors in a linear regression setting. Our approach represents a hierarchical extension of the relevance determination framework, where we add a transformed Gaussian process to model the dependencies between the prior variances of regression weights. We combine this with a structured model of the prior variances of Fourier coefficients, which eliminates unnecessary high frequencies. The resulting prior encourages weights to be region-sparse in two different bases simultaneously. We develop Laplace approximation and Monte Carlo Markov Chain (MCMC) sampling to provide efficient inference for the posterior, and show substantial improvements over existing methods for both simulated and real fMRI datasets.
Saleema Amershi -- Microsoft Research || Website
Title: Better Machine Learning Through Data
@4:30pmMachine learning is the product of both an algorithm and data. While machine learning research tends to focus on algorithmic advances, taking the data as given, machine learning practice is quite the opposite. Most of the influence practitioners have in using machine learning to build predictive models comes through interacting with data, including crafting the data used for training and examining results on new data to inform future iterations. In this talk, I will present tools and techniques we have been developing in the Machine Teaching Group at Microsoft Research to support the model building process. I will then discuss some of the open challenges and opportunities in improving the practice of machine learning.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.
In this work, we propose anchor-LIME (aLIME), a model-agnostic technique that produces high-precision rule-based explanations for which the coverage boundaries are very clear. We compare aLIME to linear LIME with simulated experiments, and demonstrate the flexibility of aLIME with qualitative examples from a variety of domains and tasks.