Nuit Blanche: Videos and Slides: IMA's course Applied Statistics and Machine Learning

Thursday, September 12, 2013

Videos and Slides: IMA's course Applied Statistics and Machine Learning

Following up on Why does L_1-norm induces sparsity ?, I was reminded that this past June, IMA had organized a course on Applied Statistics and Machine Learning. We now have the program, with slides, R programs and attendant videos of the course. The videos can seen by clicking on the link next to this arrow:

. I very much liked the presentations and in particular the randomized algorithm one from Alex Smola. Enjoy!

David Madigan - Columbia University

Bayesian Linear Model
June 20, 2013 9:00 am - 10:30 am

Bayesian Linear Model
June 20, 2013, David Madigan (Columbia University)

This lecture will review basic principles of Bayesian inference, Bayesian hierarchical models, and the BUGS tool for conducting Bayesian analyses.

Graphical Model (Theory Latent Models)
June 20, 2013 11:00 am - 12:30 pm

pgm (pdf)

Graphical Model (Theory Latent Models)
June 20, 2013, David Madigan (Columbia University)

Graphical Markov models use graphs with nodes corresponding to random variables, and edges that encode conditional independence relationships between those variables. Directed graphical models (aka Bayesian networks) in particular have received considerable attention. This lecture will review basic concepts in graphical model theory such as Markov properties, equivalence, and connections with causal inference.

Graphical Model Applications: Localization in Wireless Networks and Vaccine Response Modeling
June 21, 2013 9:00 am - 10:30 am

Graphical Model Applications: Localization in Wireless Networks and Vaccine Response Modeling
June 21, 2013, David Madigan (Columbia University)

This lecture will describe two applications of Bayesian graphic models.

Counter-factual Concepts
June 21, 2013 11:00 am - 12:30 pm

Counter-factual Concepts
June 21, 2013, David Madigan (Columbia University)

The counterfactual approach to casual inference dates back at least to Neyman and has developed considerably in recent decades due to pioneering work by Rubin, Pearl, Robbins, and others. This lecture will introduce the counterfactual approach and discuss specific examples.

Julien Mairal - INRIA

Optimization for Sparse Estimation
June 22, 2013 9:00 am - 10:30 am

Lecture Slides (pdf)

Optimization for Sparse Estimation
June 22, 2013, Julien Mairal (INRIA)

We will discuss practical optimization algorithms for estimating sparse models, both from a statistics and a signal processing point of view. First, we will cover non-convex optimization techniques such as greedy algorithms and DC-programming. Second, we will focus on convex optimization: first-order methods, iterative reweighted least squares, and the homotopy method for the Lasso. References: [1] Bach, F., Jenatton, R., Mairal, J., & Obozinski, G. (2012). Optimization with sparsity-inducing penalties.Foundations and Trends in Machine Learning, 4(1), 1--106, 2012. http://lear.inrialpes.fr/people/mairal/resources/pdf/ftml.pdf

More Structured Sparsity
June 22, 2013 11:00 am - 12:30 pm

More Structured Sparsity
June 22, 2013, Julien Mairal (INRIA)

In this lecture, we will go beyond Friday's course on structured sparsity, and consider more complex models. Recently, a large amount of research in statistics and signal processing has been devoted to developing structured sparse regularization functions. The goal is to encode some a priori knowledge about an estimation problem in the regularization, in order to obtain better prediction or better interpretability. Unfortunately, the literature on the topic is is vast, and significantly different approaches are now refered to as ``structured sparsity''. We will present some aspects of this literature, and focus particularly on convex approaches for structured sparse estimation. References: [1] F. Bach, R. Jenatton, J. Mairal and G. Obozinski. Structured Sparsity through Convex Optimization. Statistical Science. 27(4). 2012 http://projecteuclid.org/euclid.ss/1356098550

Alexander Smola - Carnegie Mellon University

Randomized Algorithms for Scalable Data Analysis
June 21, 2013 2:00 pm - 3:30 pm

Lecture Slides (pdf)

Randomized Algorithms for Scalable Data Analysis
June 21, 2013, Alexander Johannes Smola (Carnegie Mellon University)

In this talk I will give an overview over a number of projection based randomized algorithms for scalable data analysis. While randomized methods are frequently used, e.g. for Bayesian inference, their potential for designing function classes, and for obtaining approximate solutions of expensive estimation problem holds significant promise. Based on a number of vignettes I will discuss basic principles and recent developments:

* The Bloom filter is an efficient structure to estimate set membership that achieves high accuracy guarantees (zero false negatives and only a small number of false positives) at minimal memory overhead. Extending this to floating point numbers yields the Count (Farach et al, 2003) and the CountMin (Cormode and Muthukrishnan, 2005) sketches that can be used for frequency counts.

* The above methods are very useful when it comes to designing memory efficient linear classifiers, leading to the class of Hash kernels for documents, and other sparse feature collections (Weinberger et al., 2009).

* Application of the same techniques to matrices yields both fast matrix muliplication algorithms (Pagh 2012), and collaborative filtering algorithms (Karatoglou et al., 2010).

* Shingles are an efficient tool for extracting random objects from a set (Broder et al., 1997) in such a way as to provide fingerprints for fast document comparison. This idea can be adapted to linear classification in the form of Conditional Random Sampling (Li et al., 2006).

* Random projections can be used for drastic dimensionality reduction. In the context of vectors this leads to locality sensitive hashing (Indyk and Motwani, 1998) used e.g. for nearest neighbor retrieval. In the context of matrices this leads to fast low-dimensional linear algebra approximation techniques (Halko et al, 2010).

* Binary variants of such projections were studied by Goemans and Williamson, 1998, and Charikar, 2005. They allow for fast reconstruction of angles between vectors. These techniques can be employed in Bayesian inference when computing inner products for exponential families (Ahmed et al., 2012).

* For nonlinear function classes related techniques were proposed by Neal, 1994 and by Rahimi and Recht, 2008 in the form of Random Kitchen Sinks. Random draws allow one to obtain approximations of many kernel functions quite efficiently. A recent paper by Le et al, 2013 shows how memory footprint and computation can be improved significantly.

Marc Suchard - University of California, Los Angeles

Massive Parallelization to Learn from Massive Data
June 25, 2013 2:00 pm - 3:30 pm

Massive Parallelization to Learn from Massive Data
June 25, 2013, Marc A. Suchard (University of California, Los Angeles (UCLA))

and

Hands-on High-performance Statistical Computing Techniques

Hands-on High-performance Statistical Computing Techniques
June 25, 2013, Marc A. Suchard (University of California, Los Angeles (UCLA))

June 25, 2013 4:00 pm - 5:30 pm

Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. Likewise, fusion of real-time satellite data with in situ sea surface temperature measurements for ecological modeling remains taxing for probabilistic spatial-temporal models on a global scale. In this talk, I discuss how high-performance statistical computation, including graphics processing units, can enable complex inference methods in these massive datasets. I focus on algorithm restructuring through techniques like block relaxation (Gibbs, cyclic coordinate descent, MM) to exploit increased data/parameter conditional independence within traditional serial structures. I find orders-of-magnitude improvement in overall run-time fitting models involving tens of millions of observations.

These approaches are ubiquitous in high-dimensional biological problems modeled through stochastic processes. To drive this point home, I conclude with a seemingly unrelated example developing nonparametric models to study the genomic evolution of infectious diseases. These infinite hidden Markov models (HMMs) generalize both Dirichlet process mixtures and the usual finite-state HMM to capture unknown heterogeneity in the evolutionary process. Data squashing strategies, coupled with massive parallelization, yield novel algorithms that bring these flexible models finally within our grasp. [Joint work with Subha Guha, Ricardo Lemos, David Madigan, Bruno Sanso and Steve Scott]

Bin Yu - University of California, Berkeley

Data Collection
June 17, 2013 9:00 am - 10:30 am

Lecture Slides (pdf)

Data Collection
June 17, 2013, Bin Yu (University of California, Berkeley)

In this lecture, we will discuss basic experimental design principles in data collection and issues regarding data quality. Specific data examples such as the entron data set will be used.

Exploratory Data Analysis (EDA)
June 17, 2013 11:00 am - 12:30 pm

Exploratory Data Analysis (EDA)
June 17, 2013, Bin Yu (University of California, Berkeley)

This lecture will cover data summarization and visualization tools such as kernel estimation, loess, scatterplot and dimension reduction via principal component analysis (PCA). Specific data examples will be used.

Linear Regression

June 18, 2013 9:00 am - 10:30 am

Lecture Slides (pdf)

Linear Regression
June 18, 2013, Bin Yu (University of California, Berkeley)

This lecture reviews least squares (LS) method for linear fitting and its statistical properties under various linear regression model assumptions. Methods will be illustrated with real data examples from instructor's research projects.

Generalized Linear Models
June 18, 2013 11:00 am - 12:30 pm

Generalized Linear Models
June 18, 2013, Bin Yu (University of California, Berkeley)

This lecture will generalize LS to weighted LS (WLS) and use WLS to connect with generalized linear models including logistic regression. Remote sensing data for cloud detection will be used.

Regularization: Model Selection and Lasso
June 19, 2013 9:00 am - 10:30 am

Regularization: Model Selection and Lasso
June 19, 2013, Bin Yu (University of California, Berkeley)

LS and Maximum Likelihood estimation (MLE) overfit when the dimension of the model is not small relative to the sample size. This happens almost always in high-dimensions. Regularziation often works by adding a penalty to the fitting criterion as in classical model selection methods such as AIC or BIC and L1-penalized LS called Lasso. We will also introduce Cross-validation (CV) for regularization parameter selection.

Structured Sparsity
June 19, 2013 11:00 am - 12:30 pm

Lecture Slides (pdf)

Structured Sparsity
June 19, 2013, Bin Yu (University of California, Berkeley)

This lecture will discuss variants and extensions of Lasso such as Lasso+LS, adaptive Lasso, and group Lasso.

Boosting and Low-rank
June 24, 2013 9:00 am - 10:30 am

Lecture Slides (pdf)

Boosting and Low-rank
June 24, 2013, Bin Yu (University of California, Berkeley)

Boosting is one of the two most successful machine learning methods with SVM. It uses gradient descent to an empirical loss function. When the step sizes are small, it is computationally efficient way to approximate Lasso. When a nuclear norm penalization is applied to L2 loss, we have the low-rank regularization arising from the Netflix competition. A subset of the netflix data will be investigated.

Ridge and SVM
June 24, 2013 11:00 am - 12:30 pm

Ridge and SVM
June 24, 2013, Bin Yu (University of California, Berkeley)

This lectures will cover two related L2-penalized regularization methods: Ridge Regression and SVM, one from the 40's and one from the 90's. And SVM is one of the two most successful machine learning methods together with Boosting.

V4 Modeling
June 25, 2013 9:00 am - 10:30 am

Lecture Slides (pdf)

V4 Modeling
June 25, 2013, Bin Yu (University of California, Berkeley)

This lecture will illustrate the power of the sparse coding principle and low-rank regularization in modeling neuron responses to natural images in the very challenging visual cortex area V4.

Sparse Modeling Theory
June 26, 2013 9:00 am - 10:30 am