The Foundations of Data Science Boot Camp workshop took place on Aug. 27 – Aug. 31, 2018 at the Simons Institute for the theory of Computing. Thanks to the organizers for the workshop: David Woodruff, Ken Clarkson, Ravi Kannan, Michael Mahoney, Andrea Montanari, Santosh Vempala, Rachel Ward
Here are the videos and some slides.
Foundations of Data Science I and Foundations of Data Science II, Ravi Kannan, Microsoft Research India
These two lectures recap some basics of Data Science. Topics will include: high-dimensional geometry, concentration inequalities, Gaussian densities and mixtures, Singular Value Decomposition (SVD), Applications of SVD, Markov Chains, Rapid Mixing, Streaming, Randomized Algorithms for Matrices etc. For those familiar with these topics, different proofs of basic theorems (than the usual ones) may be of interest.
Sketching for Linear Algebra: Basics of Dimensionality Reduction and CountSketch I and Sketching for Linear Algebra: Basics of Dimensionality Reduction and CountSketch II, David Woodruff, Carnegie Mellon University
In this tutorial, I'll give an overview of near optimal algorithms for regression, low rank approximation, and a variety of other problems. The results are based on the sketch and solve paradigm, which is a tool for quickly compressing a problem to a smaller version of itself, for which one can then run a slow algorithm on the smaller problem. These lead to the fastest known algorithms for fundamental machine learning and numerical linear algebra problems, which run in time proportional to the number of non-zero entries of the input.
Following on earlier lectures, I will discuss two additional ways to effectively sketch matrices for a variety of applications: sampling based on leverage scores, and the Subsampled Randomized Hadamard Transform.
Stochastic Gradient Descent (SGD) is the basic first-order stochastic optimization algorithm behind powerful deep learning architectures that are becoming increasingly omnipresent in society. In this lecture, we motivate the use of stochastic first-order methods and recall some convergence results for SGD. We then discuss the notion of importance sampling for SGD and how it can improve the convergence rate. Finally, we discuss methods for making SGD more "robust" to hyper-parameters of the algorithm, such as the step size, using "on the fly" adaptive step size methods such as AdaGrad, and present some theoretical results.
Sampling for Linear Algebra, Statistics, and Optimization I and Sampling for Linear Algebra, Statistics, and Optimization II, Michael Mahoney, International Computer Science Institute and UC Berkeley
Sampling methods have long been ubiquitous in data science and machine learning. Recently, due to their complementary algorithmic and statistical properties, sampling and related sketching methods are central to randomized linear algebra and stochastic optimization. We'll provide an overview of structural properties central to key results in randomized linear algebra, highlighting how sampling and sketching methods can lead to improved results. This is typically achieved in quite different ways, depending on whether one is interested in worst-case linear algebra theory bounds, or whether one is using linear algebra for numerical implementations, statistical bounds, machine learning applications, uses in iterative stochastic optimization algorithms, etc. We'll provide an overview of how sampling and sketching methods are used in randomized linear algebra in each of these different cases, highlighting similarities and differences.
Sampling for Linear Algebra, Statistics, and Optimization I
Sampling for Linear Algebra, Statistics, and Optimization II
Stochastic Second Order Optimization Methods I and Stochastic Second Order Optimization Methods II, Fred Roosta, University of Queensland
Contrary to the scientific computing community which has, wholeheartedly, embraced the second-order optimization algorithms, the machine learning (ML) community has long nurtured a distaste for such methods, in favour of first-order alternatives. When implemented naively, however, second-order methods are clearly not computationally competitive. This, in turn, has unfortunately lead to the conventional wisdom that these methods are not appropriate for large-scale ML applications. In this series of talks, we will provide an overview of various second-order optimization methods and their stochastic variants. We will demonstrate the theoretical properties as well as empirical performance of a variety of efficient Newton-type algorithms for both convex and non-convex problems. In the process, we will highlight the disadvantages of first-order methods and, in their light, showcase the practical advantages offered by appropriate application of second-order information.
Stochastic Second Order Optimization Methods I
Stochastic Second Order Optimization Methods II
I will cover estimation, hypothesis testing, and confidence intervals from a frequentist perspective, and Bayesian statistical inference. Topics in classical asymptotics including consistency, maximum likelihood estimation, asymptotic tests and confidence intervals. No statistics background is assumed.
High Dimensional Geometry and Concentration I and High Dimensional Geometry and Concentration II, Santosh Vempala, Georgia Institute of Technology
In this tutorial, we'll discuss some phenomena in high dimension, including the distribution of mass (Brunn-Minkowski, Grunbaum), log-concavity (Prékopa-Leindler), extremal properties (Dvoretzky), concentration (Lévy, CLT) and isoperimetry (Poincaré, KLS).
Algorithmic High Dimensional Robust Statistics I and Algorithmic High Dimensional Robust Statistics II, Ilias Diakonikolas, University of Southern California
Fitting a model to a collection of observations is one of the quintessential problems in statistics. The typical assumption is that the data was generated by a model of a given type (e.g., a mixture model). This is a simplifying assumption that is only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model must also be robust in the presence of corrupted data. Until recently, even for the most basic problem of robustly estimating the mean of a high-dimensional dataset, all known robust estimators were hard to compute. A recent line of work in theoretical computer science obtained the first computationally efficient robust estimators for a range of high-dimensional estimation tasks. In this tutorial talk, we will survey the algorithmic techniques underlying these estimators and the connections between them. We will illustrate these techniques for the problems of robust mean and covariance estimation. Finally, we will discuss new directions and opportunities for future work.
Algorithmic High Dimensional Robust Statistics I
Algorithmic High Dimensional Robust Statistics II
The nearest neighbor search (NNS) problem is defined as follows: Given a set P of n points in some metric space (X, D), build a data structure that, given any point q, returns a point in P that is (approximately) closest to q. In this tutorial, I will survey classic and more recent NNS data structures that are designed for the "high-dimensional" regime. In the first half, I talk about the current state of affairs for NNS over the l_1 and l_2 distances (in particular, Locality-Sensitive Hashing (LSH) and its data-dependent counterpart), while in the second half, I will focus on NNS for non-Euclidean geometries (including some of the very recent developments, such as spectral partitioning for metric spaces).
As the sizes of modern datasets grow, many classical algorithms become prohibitively expensive, as the input is often too large to be stored in the memory of a single compute node. Streaming algorithms are designed to handle massive inputs using limited space: they process a stream of data items 'on the fly' while maintaining a small memory footprint at all times. In this talk I will discuss classical techniques for solving basic statistical estimation problems (e.g., moment estimation, distinct elements) on data streams, as well as recent results on graph analysis in the streaming model.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.
No comments:
Post a Comment