Nuit Blanche: Random Features Roundup

Monday, December 07, 2015

Random Features Roundup

Following up on this morning's blog entry, here are some papers/posters featuring the use of Random Features:

Here is a Thesis Proposal and its attendant presentation slides: Scalable, Active and Flexible Learning on Distributions by Dougal J. Sutherland

A wide range of machine learning problems, including astronomical inference about galaxy clusters, natural image scene classification, parametric statistical inference, and predictions of public opinion, can be well-modeled as learning a function on (samples from) distributions. This thesis explores problems in learning such functions via kernel methods. The first challenge is one of computational efficiency when learning from large numbers of distributions: the computation of typical methods scales between quadratically and cubically, and so they are not amenable to large datasets. We investigate the approach of approximate embeddings into Euclidean spaces such that inner products in the embedding space approximate kernel values between the source distributions. We present a new embedding for a class of information-theoretic distribution distances, and evaluate it and existing embeddings on several real-world applications. We also propose the integration of these techniques with deep learning models so as to allow the simultaneous extraction of rich representations for inputs with the use of expressive distributional classifiers. In a related problem setting, common to astrophysical observations, autonomous sensing, and electoral polling, we have the following challenge: when observing samples is expensive, but we can choose where we would like to do so, how do we pick where to observe? We propose the development of a method to do so in the distributional learning setting (which has a natural application to astrophysics), as well as giving a method for a closely related problem where we search for instances of patterns by making point observations. Our final challenge is that the choice of kernel is important for getting good practical performance, but how to choose a good kernel for a given problem is not obvious. We propose to adapt recent kernel learning techniques to the distributional setting, allowing the automatic selection of good kernels for the task at hand. Integration with deep networks, as previously mentioned, may also allow for learning the distributional distance itself. Throughout, we combine theoretical results with extensive empirical evaluations to increase our understanding of the methods.

Compact Bilinear Pooling
Yang Gao, Oscar Beijbom, Ning Zhang, Trevor Darrell

Bilinear models has been shown to achieve impressive performance on a wide range of visual tasks, such as semantic segmentation, fine grained recognition and face recognition. However, bilinear features are high dimensional, typically on the order of hundreds of thousands to a few million, which makes them impractical for subsequent analysis. We propose two compact bilinear representations with the same discriminative power as the full bilinear representation but with only a few thousand dimensions. Our compact representations allow back-propagation of classification errors enabling an end-to-end optimization of the visual recognition system. The compact bilinear representations are derived through a novel kernelized analysis of bilinear pooling which provide insights into the discriminative power of bilinear pooling, and a platform for further research in compact pooling methods. Extensive experimentation illustrate the applicability of the proposed compact representations, for image classification and few-shot learning across several visual recognition tasks.

Dougal's page is here.

Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions
Amar Shah, Zoubin Ghahramani

We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive black-box objective functions. At each iteration, PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first non-greedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications, including problems in machine learning, rocket science and robotics.

Probabilistic Integration
François-Xavier Briol, Chris. J. Oates, Mark Girolami, Michael A. Osborne, Dino Sejdinovic
(Submitted on 3 Dec 2015)

Probabilistic numerical methods aim to model numerical error as a source of epistemic uncertainty that is subject to probabilistic analysis and reasoning, enabling the principled propagation of numerical uncertainty through a computational pipeline. In this paper we focus on numerical methods for integration. We present probabilistic (Bayesian) versions of both Markov chain and Quasi Monte Carlo methods for integration and provide rigorous theoretical guarantees for convergence rates, in both posterior mean and posterior contraction. The performance of probabilistic integrators is guaranteed to be no worse than non-probabilistic integrators and is, in many cases, asymptotically superior. These probabilistic integrators therefore enjoy the "best of both worlds", leveraging the sampling efficiency of advanced Monte Carlo methods whilst being equipped with valid probabilistic models for uncertainty quantification. Several applications and illustrations are provided, including examples from computer vision and system modelling using non-linear differential equations. A survey of open challenges in probabilistic integration is provided.

Structured learning of metric ensembles with application to person re-identification
Sakrapee Paisitkriangkrai, Lin Wu, Chunhua Shen, Anton van den Hengel
(Submitted on 27 Nov 2015)

Matching individuals across non-overlapping camera networks, known as person re-identification, is a fundamentally challenging problem due to the large visual appearance changes caused by variations of viewpoints, lighting, and occlusion. Approaches in literature can be categoried into two streams: The first stream is to develop reliable features against realistic conditions by combining several visual features in a pre-defined way; the second stream is to learn a metric from training data to ensure strong inter-class differences and intra-class similarities. However, seeking an optimal combination of visual features which is generic yet adaptive to different benchmarks is a unsoved problem, and metric learning models easily get over-fitted due to the scarcity of training data in person re-identification. In this paper, we propose two effective structured learning based approaches which explore the adaptive effects of visual features in recognizing persons in different benchmark data sets. Our framework is built on the basis of multiple low-level visual features with an optimal ensemble of their metrics. We formulate two optimization algorithms, CMCtriplet and CMCstruct, which directly optimize evaluation measures commonly used in person re-identification, also known as the Cumulative Matching Characteristic (CMC) curve.

Diffusion Representations
Moshe Salhov, Amit Bermanis, Guy Wolf, Amir Averbuch
(Submitted on 19 Nov 2015)

Diffusion Maps framework is a kernel based method for manifold learning and data analysis that defines diffusion similarities by imposing a Markovian process on the given dataset. Analysis by this process uncovers the intrinsic geometric structures in the data. Recently, it was suggested to replace the standard kernel by a measure-based kernel that incorporates information about the density of the data. Thus, the manifold assumption is replaced by a more general measure-based assumption.
The measure-based diffusion kernel incorporates two separate independent representations. The first determines a measure that correlates with a density that represents normal behaviors and patterns in the data. The second consists of the analyzed multidimensional data points.
In this paper, we present a representation framework for data analysis of datasets that is based on a closed-form decomposition of the measure-based kernel. The proposed representation preserves pairwise diffusion distances that does not depend on the data size while being invariant to scale. For a stationary data, no out-of-sample extension is needed for embedding newly arrived data points in the representation space. Several aspects of the presented methodology are demonstrated on analytically generated data.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !