Nuit Blanche: Slides: Kernel Methods for Big Data

Friday, April 25, 2014

Slides: Kernel Methods for Big Data

I just noticed this workshop on Kernel Methods for Big Data that just took place in Lille. Thanks to the organizers : Alain Celisse , Julien Jacques, Guillemette Marot, here are the slides and attendant abstracts:

Arthur Gretton (University College London)

slides: lecture 1/3, lecture 2/3, lecture 3/3

A short introduction to reproducing kernel Hilbert spaces The first lecture is an introduction to RKHS. The second covers embeddings of probabilities to RKHS, and characteristic RKHS (for which these embeddings are unique to each probability measure). The third lecture covers advanced topics: relation of RKHS embeddings of probabilities and energy distances, optimal kernel choice for two-sample testing, testing three-way interactions, and Bayesian inference without models.

Jean-Philippe Vert (Mines de Paris and Institut Curie)

slides: lecture

Learning with kernels: an introduction
In this short course I will give a general introduction to kernel methods, including support vector machines (SVM), for supervised classification and regression. I will explain how positive definite kernels and reproducing kernel Hilbert spaces (RKHS) allow to extend many simple linear methods to complex, structured and high-dimensional data such as images, texts or graphs, and give some hints on how kernels can be constructed for specific applications. I will also discuss the possibility to integrate heterogeneous data with multiple kernel learning (MKL).

Short talks

Sylvain Arlot (CNRS - Ecole Normale Supérieure)

slides: talk

Kernel change-point detection We tackle the change-point problem with data belonging to a general set. We propose a penalty for choosing the number of change-points in the kernel-based method of Harchaoui and Cappe (2007). This penalty generalizes the one proposed for one dimensional signals by Lebarbier (2005). We prove it satisfies a non-asymptotic oracle inequality by showing a new concentration result in Hilbert spaces. Experiments on synthetic and real data illustrate the accuracy of our method, showing it can detect changes in the whole distribution of data, even when the mean and variance are constant. Our algorithm can also deal with data of complex nature, such as the GIST descriptors which are commonly used for video temporal segmentation.

Collaboration with Alain Celisse and Zaïd Harchaoui.

Preprint: http://arxiv.org/abs/1202.3878

Francis Bach (Inria - Ecole Normale Supérieure)

slides: talk

Sharp analysis of low-rank kernel matrix approximation We consider supervised learning problems within the positive-deﬁnite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to inﬁnite-dimensional feature spaces, a common practical limiting difﬁculty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n^2). Low-rank approximations of the kernel matrix are often considered as they allow the reduction of running time complexities to O(p^2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this talk, I will show that for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of non-parametric estimators. This result enables simple algorithms that have subquadratic running time complexity, but provably exhibit the same predictive performance than existing algorithms.

Charles Bouveyron (Université Paris Descartes)

slides: talk

Kernel discriminant analysis and clustering with parsimonious Gaussian process models We present a family of parsimonious Gaussian process models which allow to build, from a finite sample, a model-based classifier in an infinite dimensional space. The proposed parsimonious models are obtained by constraining the eigen-decomposition of the Gaussian processes modeling each class. This allows in particular to use non-linear mapping functions which project the observations into infinite dimensional spaces. It is also demonstrated that the building of the classifier can be directly done from the observation space through a kernel function. The proposed classification method is thus able to classify data of various types such as categorical data, functional data or networks. Furthermore, it is possible to classify mixed data by combining different kernels. The methodology is as well extended to the unsupervised classification case and an EM algorithm is derived for the inference. Experimental results on various data sets demonstrate the effectiveness of the proposed method.

This is a joint work with S. Girard (INRIA & LJK) and M. Fauvel (INRA & Univ. Toulouse).

Jérémie Kellner (Univ. Lille 1, Inria)

slides: talk

Normality test in high (or infinite) dimension with kernels Several kernel methods assume data in the kernel space have a Gaussian distribution. We design a goodness-of-fit test for normality in the Reproducing Kernel Hilbert Space (RKHS) to check this crucial assumption. The related statistic relies on Laplace transform, which allows to overcome some drawbacks of previous strategies such as the Maximum Mean Discrepancy (MMD). A theoretical analysis of the new test procedure is provided in terms of Type-I and-II errors. These good performances are also confirmed by simulation experiments carried out on both synthetic and real data. In particular our test strategy improves upon ongoing methods in high-dimensional settings as dimension increases, still exhibiting strong power where other competitors usually fail.

Rémi Lajugie (Ecole Normale Supérieure)

slides: talk

Large margin metric learning for constraint partitioning problems We consider unsupervised partitioning problems based explicitly or implicitly on the minimization of Euclidean distortions, such as clustering, image or video segmentation, and other change-point detection problems. We emphasize on cases with specific structure, which include many practical situations ranging from mean-based change-point detection to image segmentation problems. We aim at learning a Mahalanobis metric for these unsupervised problems, leading to feature weighting and/or selection. This is done in a supervised way by assuming the availability of several (partially) labeled datasets that share the same metric. We cast the metric learning problem as a large-margin structured prediction problem, with proper definition of regularizers and losses, leading to a convex optimization problem which can be solved efficiently. Our experiments show how learning the metric can significantly improve performance on bioinformatics, video or image segmentation problems..

Cristian Preda (Université Lille 1, Inria)

slides: talk

RKHS methods for regression with functional data We consider the regression problem with both predictor and response of functional type. The regression function is approximated over the space of functionals defined in some RKHS. Consistency of the estimators and numerical results are presented.

Guillem Rigaill (Univ. Evry Val-d'Essonne)

slides: talk

Kernel-based multiple change-point detection We consider the problem of kernel-based multiple change-point detection. From an algorithmic point of view it is possible to use dynamic programming to recover the best segmentations in 1 to K segments. However a naive implementation of such algorithm is quadratic in time and it is also quadratic in space as one typically needs to store and access the elements of the Gram Matrix. This is a problem to process large profiles. We describe an algorithm which has the same time complexity and yet is linear in space. We also describe a simple heuristic which is linear in time. We illustrate this approach on 2 dimensional DNA copy number profiles.

This is a joint work with Alain Celisse, Guillemette Marot and Morgane Pierre-Jean

Dino Sejdinovic (University College London)

slides: talk

MCMC Kameleon: Kernel Adaptive Metropolis-Hastings A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples.

Joint work with Heiko Strathmann, Maria Lomeli Garcia, Christophe Andrieu and Arthur Gretton

preprint: http://arxiv.org/abs/1307.5302

code: https://github.com/karlnapf/kameleon-mcmc

Zoltan Szabo (University College London)

slides: talk

Learning on Distributions

Problems formulated in terms of distributions have recently gained widespread attention. An important task that belongs to this family is distribution regression: regressing to a real-valued response from a probability distribution. One particularly challenging difficulty of the task is its two-stage sampled nature: in practise we only have samples from sampled distributions. In my presentation I am going to talk about two (intimately related) directions to tackle this difficulty. Firstly, I am going to present a recently released information theoretical estimators open source toolkit capable of estimating numerous dependency, similarity measures on distributions in a nonparametric way. Next, I will propose an algorithmically very simple approach to tackle the distribution regression: embed the distributions to a reproducing kernel Hilbert space, and learn a ridge regressor from the embeddings to the outputs. I will show that (i) this technique is consistent in the two-stage sampled setting under fairly mild conditions, and (ii) it gives state-of-the-art results on supervised entropy learning and the prediction problem of aerosol optical depth based on satellite images.

preprint: http://arxiv.org/pdf/1402.1754

ITE toolbox: https://bitbucket.org/szzoli/ite/

Join the CompressiveSensing subreddit or the Google+ Community and post there !