Nuit Blanche: Blue Skies: Foundational principles for large scale inference, Stochastic Simulation and Optimization Methods in Signal Processing, Streaming and Online Data Mining , Kernel Models and more.

Monday, May 18, 2015

Blue Skies: Foundational principles for large scale inference, Stochastic Simulation and Optimization Methods in Signal Processing, Streaming and Online Data Mining , Kernel Models and more.

The following papers and presentations provide a bird's eye view as to where we are on specific topics related to some of the issues discussed here on Nuit Blanche. Enjoy !

Foundational principles for large scale inference: Illustrations through correlation mining by Alfred O. Hero, Bala Rajaratnam

When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number $n$ of acquired samples (statistical replicates) is far fewer than the number $p$ of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size $n$ is fixed, and the dimension $p$ grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

Tutorial on Stochastic Simulation and Optimization Methods in Signal Processing by Marcelo Pereyra, Philip Schniter, Emilie Chouzenoux, Jean-Christophe Pesquet, Jean-Yves Tourneret, Alfred Hero, Steve McLaughlin

Modern signal processing (SP) methods rely very heavily on probability and statistics to solve challenging SP problems. Expectations and demands are constantly rising, and SP methods are now expected to deal with ever more complex models, requiring ever more sophisticated computational inference techniques. This has driven the development of statistical SP methods based on stochastic simulation and optimization. Stochastic simulation and optimization algorithms are computationally intensive tools for performing statistical inference in models that are analytically intractable and beyond the scope of deterministic inference methods. They have been recently successfully applied to many difficult problems involving complex statistical models and sophisticated (often Bayesian) statistical inference techniques. This paper presents a tutorial on stochastic simulation and optimization methods in signal and image processing and points to some interesting research problems. The paper addresses a variety of high-dimensional Markov chain Monte Carlo (MCMC) methods as well as deterministic surrogate methods, such as variational Bayes, the Bethe approach, belief and expectation propagation and approximate message passing algorithms. It also discusses a range of optimization methods that have been adopted to solve stochastic problems, as well as stochastic methods for deterministic optimization. Subsequently, areas of overlap between simulation and optimization, in particular optimization-within-MCMC and MCMC-driven optimization are discussed.

Streaming and Online Data Mining by Edo Liberty

The talk provides a quick introduction to streaming and online data mining algorithms. These algorithms are required to summarize, process, or act upon an arbitrary sequence of events (data records). At every point in time, future events/data are unknown and past event are too numerous to store. While this computational model is severely restricting, it is, de facto, the working model in many large scale data systems. This talk introduces some classic and some new results in the field and show how they apply to email threading, news story categorization, clustering, regression, and factor or principal component analysis.

Also from Edo Liberty, here is this presentation on Low Rank Approximation of Matrices.

And finally, from Johan Suykens' main page:

"Kernel methods for complex networks and big data": invited lecture at Statlearn 2015, Grenoble 2015: [pdf]

"Fixed-size Kernel Models for Big Data": invited lectures at BigDat 2015, International Winter School on Big Data, Tarragona, Spain 2015:

- Part I: Support vector machines and kernel methods: an introduction [pdf]

- Part II: Fixed-size kernel models for mining big data [pdf] [video]

- Part III: Kernel spectral clustering for community detection in big data networks [pdf]

Dec 11, 2014: "Fixed-size kernel methods for data-driven modelling": plenary talk at ICLA 2014, International Conference on Learning and Approximation, Shanghai China 2014 [pdf]

"Fixed-size kernel methods for data-driven modelling": plenary talk at ICLA 2014, International Conference on Learning and Approximation, Shanghai China 2014 [pdf]

"Kernel-based modelling for complex networks": plenary talk at NOLTA 2014, International Symposium on Nonlinear Theory and its Applications, Luzern Switzerland 2014 [pdf]

"Learning with matrix and tensor based models using low-rank penalties": invited talk at Workshop on Nonsmooth optimization in machine learning, Liege Belgium 2013 [pdf]

Invited lecture series - Leerstoel VUB 2012 [pdf]

Advanced data-driven black-box modelling - inaugural lecture [pdf]

Support vector machines and kernel methods in systems, modelling and control [pdf]

Data-driven modelling for biomedicine and bioinformatics [pdf]

Kernel methods for exploratory data analysis and community detection [pdf]

Complex networks, synchronization and cooperative behaviour [pdf]