Page Views on Nuit Blanche since July 2010

On ArXiv:
Approximating Kernels at the speed of Light
&
Imaging with Nature

LightOn
Our Newsletter ||@LightOnIO (982)|| on LinkedIn || on CrunchBase || our Blog
(2369)
(3978)
(1326)||
Attendant references pages:
The Advanced Matrix Factorization Jungle Page ||

Paris Machine Learning
@Meetup.com (7664 members) || @Archives

Thursday, April 25, 2019

Why are Big Data Matrices Approximately Low Rank?

It used to be the most imaging scene were compressible and the JPEG standard was a daily reminder of that reality. In the following paper, we are getting sharper re-assurances about the data world around us. I note the use of tools like the \epsilon-rank is a reminder to \epsilon-pseudospectrum in non-normal settings. Taking this argument in the reverse, if you don't find a close-by low rank matrix next to your data matrix, maybe there is something wrong with your data. Data matrix low-rankedness is needed in several Matrix Factorization operations.

Matrices of (approximate) low rank are pervasive in data science, appearing in recommender systems, movie preferences, topic models, medical records, and genomics. While there is a vast literature on how to exploit low rank structure in these datasets, there is less attention on explaining why the low rank structure appears in the first place. Here, we explain the effectiveness of low rank models in data science by considering a simple generative model for these matrices: we suppose that each row or column is associated to a (possibly high dimensional) bounded latent variable, and entries of the matrix are generated by applying a piecewise analytic function to these latent variables. These matrices are in general full rank. However, we show that we can approximate every entry of an m×n matrix drawn from this model to within a fixed absolute error by a low rank matrix whose rank grows as O(log(m+n)). Hence any sufficiently large matrix from such a latent variable model can be approximated, up to a small entrywise error, by a low rank matrix.
And as George Linderman points out on Twitter:

Published in SIAM Jounral on Mathematics of Data Science.

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Wednesday, April 24, 2019

Enhanced Expressive Power and Fast Training of Neural Networks by Random Projections

Here is one way random projections can reduce training time in neural networks !

Random projections are able to perform dimension reduction efficiently for datasets with nonlinear low-dimensional structures. One well-known example is that random matrices embed sparse vectors into a low-dimensional subspace nearly isometrically, known as the restricted isometric property in compressed sensing. In this paper, we explore some applications of random projections in deep neural networks. We provide the expressive power of fully connected neural networks when the input data are sparse vectors or form a low-dimensional smooth manifold. We prove that the number of neurons required for approximating a Lipschitz function with a prescribed precision depends on the sparsity or the dimension of the manifold and weakly on the dimension of the input vector. The key in our proof is that random projections embed stably the set of sparse vectors or a low-dimensional smooth manifold into a low-dimensional subspace. Based on this fact, we also propose some new neural network models, where at each layer the input is first projected onto a low-dimensional subspace by a random projection and then the standard linear connection and non-linear activation are applied. In this way, the number of parameters in neural networks is significantly reduced, and therefore the training of neural networks can be accelerated without too much performance loss.

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Tuesday, April 23, 2019

CSHardware: Development of sparse coding and reconstruction subsystems for astronomical imaging, João Rino-Silvestre

João just let me know the following:
Salut Igor,
I've noticed that you have a blog dedicated to compressed sensing, so I thought that my MSc dissertation (in which I detail the development of subsystems and of a stand-alone instrument prototype) might be of interest to you and your community. I leave you the link below:
http://hdl.handle.net/10451/37722

Best regards,
João Rino-Silvestre

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Book: High-Dimensional Probability An Introduction with Applications in Data Science by Roman Vershynin

Found in the comment section of Terry's blogRoman Vershynin is writing a book titled: High-Dimensional Probability An Introduction with Applications in Data Science (University of California, Irvine March 25, 2019)

Here is the table of content:
Preface vi Appetizer: using probability to cover a geometric set
1 Preliminaries on random variables 6
1 1 Preliminaries on random variables 6
1.1 Basic quantities associated with random variables 6
1.2 Some classical inequalities 7
1.3 Limit theorems 9
1.4 Notes 12
2 Concentration of sums of independent random variables 13
2.1 Why concentration inequalities? 13
2.2 Hoeffding’s inequality 16
2.3 Chernoff’s inequality 19
2.4 Application: degrees of random graphs 21
2.5 Sub-gaussian distributions 24
2.6 General Hoeffding’s and Khintchine’s inequalities 29
2.7 Sub-exponential distributions 32
2.8 Bernstein’s inequality 37
2.9 Notes 40
3 Random vectors in high dimensions 42
3.1 Concentration of the norm 43
3.2 Covariance matrices and principal component analysis 45
3.3 Examples of high-dimensional distributions 50
3.4 Sub-gaussian distributions in higher dimensions 56
3.5 Application: Grothendieck’s inequality and semidefinite programming 60
3.6 Application: Maximum cut for graphs 66
3.7 Kernel trick, and tightening of Grothendieck’s inequality 70
3.8 Notes 74
4 Random matrices 76
4.1 Preliminaries on matrices 76
4.2 Nets, covering numbers and packing numbers 81
4.3 Application: error correcting codes 86
4.4 Upper bounds on random sub-gaussian matrices 89
4.5 Application: community detection in networks 93
4.6 Two-sided bounds on sub-gaussian matrices 97 iii iv Contents
4.7 Application: covariance estimation and clustering 99
4.8 Notes 103
5 Concentration without independence 105
5.1 Concentration of Lipschitz functions on the sphere 105
5.2 Concentration on other metric measure spaces 112
5.3 Application: Johnson-Lindenstrauss Lemma 118
5.4 Matrix Bernstein’s inequality 121
5.5 Application: community detection in sparse networks 129
5.6 Application: covariance estimation for general distributions 129
5.7 Notes 133
6 Quadratic forms, symmetrization and contraction 135
6.1 Decoupling 135
6.2 Hanson-Wright Inequality 139
6.3 Concentration of anisotropic random vectors 142
6.4 Symmetrization 145
6.5 Random matrices with non-i.i.d. entries 147
6.6 Application: matrix completion 148
6.7 Contraction Principle 151 6.8 Notes 154
7 Random processes 156
7.1 Basic concepts and examples 156
7.2 Slepian’s inequality 160
7.3 Sharp bounds on Gaussian matrices 167
7.4 Sudakov’s minoration inequality 170
7.5 Gaussian width 172
7.6 Stable dimension, stable rank, and Gaussian complexity 178
7.7 Random projections of sets 181
7.8 Notes 185
8 Chaining 187
8.1 Dudley’s inequality 187
8.2 Application: empirical processes 195
8.3 VC dimension 200
8.4 Application: statistical learning theory 212
8.5 Generic chaining 219
8.6 Talagrand’s majorizing measure and comparison theorems 223
8.7 Chevet’s inequality 225
8.8 Notes 227
9 Deviations of random matrices and geometric consequences 229
9.1 Matrix deviation inequality 229
9.2 Random matrices, random projections and covariance estimation 235
9.3 Johnson-Lindenstrauss Lemma for infinite sets 238
9.4 Random sections: M∗ bound and Escape Theorem 240
9.5 Notes
10 Sparse Recovery 246
10.1 High-dimensional signal recovery problems 246
10.2 Signal recovery based on M∗ bound 248
10.3 Recovery of sparse signals 250
10.4 Low-rank matrix recovery 254
10.5 Exact recovery and the restricted isometry property 256
10.6 Lasso algorithm for sparse regression 262
10.7 Notes 267
11 Dvoretzky-Milman’s Theorem 269
11.1 Deviations of random matrices with respect to general norms 269
11.2 Johnson-Lindenstrauss embeddings and sharper Chevet inequality 272
11.3 Dvoretzky-Milman’s Theorem 274
11.4 Notes 279
Bibliography 280
Index

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Monday, April 22, 2019

Videos: New Deep Learning Techniques, February 5 - 9, 2018, IPAM Workshop

Overview

In recent years, artificial neural networks a.k.a. deep learning have significantly improved the fields of computer vision, speech recognition, and natural language processing. The success relies on the availability of large-scale datasets, the developments of affordable high computational power, and basic deep learning operations that are sound and fast as they assume that data lie on Euclidean grids. However, not all data live on regular lattices. 3D shapes in computer graphics represent Riemannian manifolds. In neuroscience, brain activity (fMRI) is encoded on the structural connectivity network (sMRI). In genomics, the human body functionality is expressed through DNA, RNA, and proteins that form the gene regulatory network (GRN). In social sciences, people interact through networks. Eventually, data in communication networks are structured by graphs like the Internet or road traffic networks.

Deep learning that has originally been developed for computer vision cannot be directly applied to these highly irregular domains, and new classes of deep learning techniques must be designed. This is highly challenging as most standard data analysis tools cannot be used on heterogonous data domains. The workshop will bring together experts in mathematics (statistics, harmonic analysis, optimization, graph theory, sparsity, topology), machine learning (deep learning, supervised & unsupervised learning, metric learning) and specific applicative domains (neuroscience, genetics, social science, computer vision) to establish the current state of these emerging techniques and discuss the next directions.
This workshop will include a poster session; a request for posters will be sent to registered participants in advance of the workshop.
Here are the videos (and some slides):
Samuel Bowman (New York University)
Toward natural language semantics in learned representations

Emily Fox (University of Washington)
Interpretable and Sparse Neural Network Time Series Models for Granger Causality Discovery

Ellie Pavlick (University of Pennsylvania), Should we care about linguistics?

Leonidas Guibas (Stanford University), Knowledge Transport Over Visual Data

Yann LeCun (New York University), Public Lecture: Deep Learning and the Future of Artificial Intelligence

Alán Aspuru-Guzik (Harvard University), Generative models for the inverse design of molecules and materials

Daniel Rueckert (Imperial College), Deep learning in medical imaging: Techniques for image reconstruction, super-resolution and segmentation

Kyle Cranmer (New York University), Deep Learning in the Physical Sciences

Stéphane Mallat (École Normale Supérieure), Deep Generative Networks as Inverse Problems

Michael Elad (Technion - Israel Institute of Technology), Sparse Modeling in Image Processing and Deep Learning

Yann LeCun (New York University), Public Lecture: AI Breakthroughs & Obstacles to Progress, Mathematical and Otherwise

Xavier Bresson (Nanyang Technological University, Singapore), Convolutional Neural Networks on Graphs

Federico Monti (Universita della Svizzera Italiana), Deep Geometric Matrix Completion: a Geometric Deep Learning approach to Recommender Systems

Joan Bruna (New York University), On Computational Hardness with Graph Neural Networks

Jure Leskovec (Stanford University), Large-scale Graph Representation Learning

Arthur Szlam (Facebook), Composable planning with attributes

Yann LeCun (New York University), A Few (More) Approaches to Unsupervised Learning

Sanja Fidler (University of Toronto), Teaching Machines with Humans in the Loop
Raquel Urtasun (University of Toronto), Deep Learning for Self-Driving Cars

Pratik Chaudhari (University of California, Los Angeles (UCLA)), Unraveling the mysteries of stochastic gradient descent on deep networks

Stefano Soatto (University of California, Los Angeles (UCLA)), Emergence Theory of Deep Learning

Tom Goldstein (University of Maryland), What do neural net loss functions look like?

Stanley Osher (University of California, Los Angeles (UCLA)), New Techniques in Optimization and Their Applications to Deep Learning and Related Inverse Problems

Michael Bronstein (USI Lugano, Switzerland), Deep functional maps: intrinsic structured prediction for dense shape correspondence

Sainbayar Sukhbaatar (New York University), Deep Architecture for Sets and Its Application to Multi-agent Communication

Zuowei Shen (National University of Singapore), Deep Learning: Approximation of functions by composition

Wei Zhu (Duke University), LDMnet: low dimensional manifold regularized neural networks

Join the CompressiveSensing subreddit or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, April 20, 2019

Videos: Sublinear Algorithms and Nearest-Neighbor Search workshop, Nov. 27 – Nov. 30, 2018 (Simon Institute and Kavli Foundation)

The Sublinear Algorithms and Nearest-Neighbor Search workshop is part of the Program on Foundations of Data Science sponsored by the Simons Institute for the Theory of Computing at Berkeley and the Kavli Foundation. It took place Nov. 27 – Nov. 30, 2018 in Berkeley. Thank you to the organizer to make this workshop a reality: Robert Krauthgamer , Artur Czumaj , Aarti Singh , Rachel Ward . The introduction for the workshop goes as follows:

Many applications require extreme computational efficiency, where the usage of resources, like runtime, storage, or data samples, is sublinear in the size of the input, and the workshop will cover different areas where this topic is studied, and explore connections between them. Specifically, this topic received a lot of attention within Theoretical Computer Science, often motivated by advances in various application areas, and the workshop will review recent developments, such as algorithms for data streams, sketching, dimensionality reduction, graph sparsification, and property testing. In addition, the workshop will examine connections with linear and sublinear methods in Machine Learning, Statistics, Signal Processing and related areas, such as the well-known connections with compressed sensing, sparse recovery, and nearest-neighbor search. Some more recent connections are to online bandit problems and to distributed optimization, where constraints on algorithmic resources are just starting to be considered; such problems may be amenable to techniques from data-stream algorithms.

Here are the videos:

Learning-Augmented Sketches for Frequency EstimationPiotr Indyk (Massachusetts Institute of Technology)
Classical streaming algorithms typically do not leverage data properties or patterns in their input. We propose to augment such algorithms with a learning model that enables them to exploit data properties without being specific to a particular pattern or property. We focus on the problem of estimating the frequency of elements in a data stream, a fundamental problem with applications in network measurements, natural language processing, and security. We propose a new framework for designing frequency estimation streaming algorithms that automatically learn to leverage the properties of the input data. We present a theoretical analysis of the proposed algorithms and prove that, under natural assumptions, they have lower space complexity than prior algorithms. We also evaluate our algorithms on two problems ? monitoring Internet traffic and tracking the popularity of search queries ? and demonstrate their performance gains. Joint work with Chen-Yu Hsu, Dina Katabi and Ali Vakilian.

In sparse recovery/compressed sensing, one can estimate a k-sparse vector in n dimensions with only Theta(k log n) nonadaptive linear measurements. With adaptivity -- if each measurement can be based on the previous ones -- this reduces to O(k log log n). But what happens if the measurement matrices can only be chosen in a few rounds, as seen (for example) in constant-pass streaming algorithms? This talk will give upper and lower bounds, showing (up to a log^* k factor) that R rounds of adaptivity require Theta(k log^{1/R} n) measurements.

Universal Sketches,Vladimir Braverman (Johns Hopkins University)
Streaming and sketching algorithms have found many applications in computer science and other areas. A typical sketching algorithm approximates one function. Given a class of functions F, it is natural to ask if it is possible to compute a single sketch S that will approximate every function f from F. We call S a “universal sketch” for F. In this talk we will discuss results on universal sketches for several classes of functions. For example, we will describe a sketch that approximates a sub-class of symmetric norms (a norm is symmetric if it is invariant under sign-flips and coordinate-permutations) and outline a connection between universal sketches and concentration of measure and Milman’s theorem. Also, we will describe a recent result for subset (i.e. 0-1 weighted) l0 and l1 norms. For these problems we obtain a nearly optimal upper and lower bounds for streaming space complexity. We will discuss the applicability of universal sketches for Software Defined Networks (SDN). For SDN, we will present the UnivMon (short for Universal Monitoring) framework that can simultaneously achieve both generality and high ﬁdelity across a broad spectrum of monitoring tasks.
This talk is based on joint works with Jaroslaw Blasiok, Stephen R. Chestnut, Robert Krauthgamer and Lin F. Yang (STOC 2017), with Robert Krauthgamer and Lin F. Yang (submitted) and with Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, and Vyas Sekar (HotNets 2015, SIGCOMM 2016).

Approximating the Cost of a Metric K-Nearest Neighbor Graph in Sublinear TimeChristian Sohler (Technische Universität Dortmund and Google Switzerland)
Let (X,d) be an n-point metric space. We assume that (X,d) is given in the distance oracle model, i.e., X={1,...,n} and for every pair of points x,y from X we can query their distance d(x,y) in constant time. A k-nearest neighbor (k-NN) graph} for (X,d) is a directed graph G=(V,E) that has an edge to each of v's k nearest neighbors. We use cost(G) to denote the sum of edge weights of G.
In this paper, we study the problem of approximating cost(G) in sublinear time, when we are given oracle access to the metric space (X,d) that defines G. Our goal is to develop an algorithm that solves this problem faster than the time required to compute G. To this end, we develop an algorithm that in time ~O(min (n k^{3/2} / eps^6, n^2 / (eps^2 k))) computes an estimate K for the cost of the minimum spanning tree that satisfies with probability at least 2/3
|cost(G) - K | less or equal to eps (cost(G) + mst(X))
where mst(X) denotes the cost of the minimum spanning tree of (X,d).
Joint work with Artur Czumaj. Work was done as part of the speaker's affiliation with Google Switzerland.

Independent samples from an unknown probability distribution p on a domain of size k are distributed across n players, with each player holding one sample. Each player can send a message to a central referee in a simultaneous message passing (SMP) model of communication, whose goal is to solve a pre-specified inference problem. The catch, however, is that each player cannot simply send their own sample to the referee; instead, the message they send must obey some (local) information constraint. For instance, each player may be limited to communicating only L bits, where L << log k; or they may seek to reveal as little information as possible, and preserve local differentially privacy. We propose a general formulation for inference problems in this distributed setting, and instantiate it to two fundamental inference questions, learning and uniformity testing. We study the role of randomness for those questions, and obtain striking separations between public- and private-coin protocols for the latter, while showing the two settings are equally powerful for the former. (Put differently, sharing with your neighbors does help a lot for the test, but not really for the learning.) Based on joint works with Jayadev Acharya (Cornell University), Cody Freitag (Cornell University), and Himanshu Tyagi (IISc Bangalore).

We initiate the study of the role of erasures in local decoding and use our understanding to prove a separation between erasure-resilient and tolerant property testing. Local decoding in the presence of errors has been extensively studied, but has not been considered explicitly in the presence of erasures. Motivated by applications in property testing, we begin our investigation with local {\em list} decoding in the presence of erasures. We prove an analog of a famous result of Goldreich and Levin on local list decodability of the Hadamard code. Specifically, we show that the Hadamard code is locally list decodable in the presence of a constant fraction of erasures, arbitrary close to 1, with list sizes and query complexity better than in the Goldreich-Levin theorem. We use this result to exhibit a property which is testable with a number of queries independent of the length of the input in the presence of erasures, but requires a number of queries that depends on the input length, $n$, for tolerant testing. We further study {\em approximate} locally list decodable codes that work against erasures and use them to strengthen our separation by constructing a property which is testable with a constant number of queries in the presence of erasures, but requires $n^{\Omega(1)}$ queries for tolerant testing. Next, we study the general relationship between local decoding in the presence of errors and in the presence of erasures. We observe that every locally (uniquely or list) decodable code that works in the presence of errors also works in the presence of twice as many erasures (with the same parameters up to constant factors). We show that there is also an implication in the other direction for locally decodable codes (with unique decoding): specifically, that the existence of a locally decodable code that works in the presence of erasures implies the existence of a locally decodable code that works in the presence of errors and has related parameters. However, it remains open whether there is an implication in the other direction for locally {\em list} decodable codes. (Our Hadamard result shows that there has to be some difference in parameters for some settings.) We relate this question to other open questions in local decoding. Based on joint work with Noga Ron-Zewi and Nithin Varma.

Sublinear Time Local-Access Random GeneratorsRonitt Rubinfeld (Massachusetts Institute of Technology)

No abstract available.

Locality sensitive hashing (LSH) is a popular technique for nearest neighbor search in high dimensional data sets. Recently, a new view at LSH as a biased sampling technique has been fruitful for density estimation problems in high dimensions. Given a set of points and a query point, the goal (roughly) is to estimate the density of the data set around the query. One way to formalize this is by kernel density estimation: Given a function that decays with distance and represents the "influence" of a data point at the query, sum up this influence function over the data set. Yet another way to formalize this problem is by counting the number of data points within a certain radius of the query. While these problems can easily be solved by making a linear pass over the data, this can be prohibitive for large data sets and multiple queries. Can we preprocess the data so as to answer queries efficiently? This talk will survey several recent papers that use locality sensitive hashing to design unbiased estimators for such density estimation problems and their extensions. This talk will survey joint works with Arturs Backurs, Piotr Indyk, Vishnu Natchu, Paris Syminelakis and Xian (Carrie) Wu.

We consider algorithms that take an unlabeled data set and label it in its entirety, given the ability to interact with a human expert. The goal is to minimize the amount of interaction while producing a labeling that satisfies an (epsilon, delta) guarantee: with probability at least 1-delta over the randomness in the algorithm, at most an epsilon fraction of the labels are incorrect. Scenario 1: The algorithm asks the expert for labels of specific points. This is the standard problem of active learning, except that the final product is a labeled data set rather than a classifier. Scenario 2: The expert also provides "weak rules" or helpful features. We will summarize the state of the art on these problems, in terms of promising algorithms and statistical guarantees, and identify key challenges and open problems.

An Optimal Space Lower Bound for Approximating MAX-CUTMichael Kapralov (Ecole Polytechnique Federale de Lausanne)
We consider the problem of estimating the value of MAX-CUT in a graph in the streaming model of computation. At one extreme, there is a trivial $2$-approximation for this problem that uses only $O(\log n)$ space, namely, count the number of edges and output half of this value as the estimate for the size of the MAX-CUT. On the other extreme, for any fixed $\eps > 0$, if one allows $\tilde{O}(n)$ space, a $(1+\eps)$-approximate solution to the MAX-CUT value can be obtained by storing an $\tilde{O}(n)$-size sparsifier that essentially preserves MAX-CUT value.
Our main result is that any (randomized) single pass streaming algorithm that breaks the $2$-approximation barrier requires $\Omega(n)$-space, thus resolving the space complexity of any non-trivial approximations of the MAX-CUT value to within polylogarithmic factors in the single pass streaming model. We achieve the result by presenting a tight analysis of the Implicit Hidden Partition Problem introduced by Kapralov et al.[SODA'17] for an arbitrarily large number $k$ of players. In this problem a number of players receive random matchings of $\Omega(n)$ size together with random bits on the edges, and their task is to determine whether the bits correspond to parities of some hidden bipartition, or are just uniformly random.
Unlike all previous Fourier analytic communication lower bounds, our analysis does not directly use bounds on the $\ell_2$ norm of Fourier coefficients of a typical message at any given weight level that follow from hypercontractivity. Instead, we use the fact that graphs received by players are sparse (matchings) to obtain strong upper bounds on the $\ell_1$ norm of the Fourier coefficients of the messages of individual players using their special structure, and then argue, using the convolution theorem, that similar strong bounds on the $\ell_1$ norm are essentially preserved (up to an exponential loss in the number of players) once messages of different players are combined. We feel that our main technique is likely of independent interest.

Any graph with maximum degree Delta admits a proper vertex coloring with Delta+1 colors that can be found via a simple sequential greedy algorithm in linear time and space. But can one find such a coloring via a sublinear algorithm? In this talk, I present new algorithms that answer this question in the affirmative for several canonical classes of sublinear algorithms including graph streaming, sublinear time, and massively parallel computation (MPC) algorithms. At the core of these algorithms is a remarkably simple meta-algorithm for the (Delta+1) coloring problem: Sample O(log n) colors for each vertex uniformly at random from the Delta+1 colors and then find a proper coloring of the graph using the sampled colors; our main structural result states that the sampled set of colors with high probability contains a proper coloring of the input graph.

Efficient algorithms for k-means clustering frequently converge to suboptimal partitions, and given a partition, it is difficult to detect k-means optimality. We discuss an a posteriori certifier of approximate optimality for k-means clustering based on Peng and Wei's semidefinite relaxation of k-means.

Efficient algorithms for k-means clustering frequently converge to suboptimal partitions, and given a partition, it is difficult to detect k-means optimality. In this paper, we develop an a posteriori certifier of approximate optimality for k-means clustering. The certifier is a sub-linear Monte Carlo algorithm based on Peng and Wei's semidefinite relaxation of k-means. In particular, solving the relaxation for small random samples of the dataset produces a high-confidence lower bound on the k-means objective, and being sub-linear, our algorithm is faster than k-means++ when the number of data points is large. If the data points are drawn independently from any mixture of two Gaussians over R^m with identity covariance, then with probability 1?O(1/m), our poly(m)-time algorithm produces a 3-approximation certificate with 99% confidence (no separation required). We also introduce a linear-time Monte Carlo algorithm that produces an O(k) additive approximation lower bound.

We provide a novel ? and to the best of our knowledge, the first ? algorithm for high dimensional sparse regression with corruptions in explanatory and/or response variables. Our algorithm recovers the true sparse parameters in the presence of a constant fraction of arbitrary corruptions. Our main contribution is a robust variant of Iterative Hard Thresholding. Using this, we provide accurate estimators with sub-linear sample complexity. Our algorithm consists of a novel randomized outlier removal technique for robust sparse mean estimation that may be of interest in its own right: it is orderwise more efficient computationally than existing algorithms, and succeeds with high probability, thus making it suitable for general use in iterative algorithms.

We study the PCA and column subset selection problems in matrices in an online setting, where the columns arrive one after the other. In the context of column subset selection, the goal is to decide whether to include or discard a column, as it arrives. We design a simple algorithm that includes at most O(k \polylog n) columns overall and achieves a multiplicative (1+\epsilon) error compared to the best rank-k approximation of the full matrix. This result may be viewed as an analog of the classic result of Myerson on online clustering.

The need for fast computation typically requires tradeoffs with statistical accuracy; here we are interested in whether computation can be significantly improved without trading-off accuracy.
In particular, for best possible accuracy in NN prediction, the number of neighbors generally needs to grow as a root of n (sample size), consequently limiting NN-search (any technique) to order of root of n complexity; in other words, expensive prediction seems unavoidable, even while using fast search methods, if accuracy is to be optimal. Unfortunately, the usual alternative is to tradeoff accuracy.
Interestingly, we show that it is possible to maintain accuracy, while reducing computation (at prediction time) to just O(log n), through simple bias and or variance correction tricks applied after data quantization or subsampling, together with (black box) fast search techniques. Furthermore, our analysis yields clear insights into how much quantization or subsampling is tolerable if optimal accuracy is to be achieved.
Our theoretical insights are validated through extensive experiments with large datasets from various domains.
The talk is based on a series of works with N. Verma, and with L. Xue.

We establish a generic reduction from nonlinear spectral gaps of metric spaces to space partitions, in the form of data-dependent Locality-Sensitive Hashing. This yields a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN). Using this reduction, we obtain a new ANN data structure under an arbitrary d-dimensional norm, where the query algorithm makes only a sublinear number of probes into the data structure. Most importantly, the new data structure achieves a O(log d) approximation for an arbitrary norm. The only other such generic approach, via John's ellipsoid, would achieve square-root-d approximation only. Joint work with Assaf Naor, Aleksandar Nikolov, Ilya Razenshteyn, and Erik Waingarten.

I will show the first approximate nearest neighbor search data structure for a general d-dimensional normed space with sub-polynomial in d approximation.
The main tool is a finite-dimensional quantitative version of a theorem of Daher, which yields a Holder homeomorphism between small perturbations of a normed space of interest and a Euclidean space. To make Daher's theorem algorithmic, we employ convex programming to compute the norm of a vector in a space, which is the result of complex interpolation between two given normed spaces.
Based on a joint work (FOCS 2018) with Alex Andoni, Assaf Naor, Sasho Nikolov and Erik Waingarten.

Theoretical work on high-dimensional nearest neighbor search has focused on the setting where a single point is sought within a known search radius, and an acceptable approximation ratio c is given. Locality Sensitive Hashing is a powerful framework for addressing this problem. In practice one usually seeks the (exact) k nearest points, the search radius is unknown, and the parameter c must be chosen in a way that depends on the data distribution. Though reductions of the latter problem to the former exist, they incur polylogarithmic overhead in time and/or space, which in turn make them unattractive in many practical settings. We address this discrepancy between theory and practice by suggesting new, simple, more efficient reductions for solving the k-Nearest Neighbor search problem using Locality Sensitive Hashing. Joint work with Tobias Christiani and Mikkel Thorup.
The main story of this talk is how theoretical ideas on randomized sampling algorithms became part of production code at Twitter. The specific context is finding pairs of similar items: a classic algorithmic problem that is an integral part of recommendation systems. In most incarnations, it boils down to finding high inner products among a large collection of vectors, or alternately high entries in a matrix product. Despite a rich literature on this topic (and despite Twitter's significant compute resources), none of the existing methods scaled to "industrial sized" inputs, which exceed hundreds of billions of non-zeros. I will talk about a distributed algorithm for this problem, that combines low-dimension projections (hashes) with path-sampling techniques (wedges). There is some cute math behind the algorithm, and we were able to run it in production on Twitter's recommendation system. Joint work with Aneesh Sharma (Twitter) and Ashish Goel (Stanford).

In this talk I will discuss how to recover spectral approximations to broad classes of structured matrices using only a polylogarithmic number of adaptive linear measurements to either the matrix or its inverse. Leveraging this result I will discuss how to achieve faster algorithms for solving a variety of linear algebraic problems including solving linear systems in the inverse of symmetric M-matrices (a generalization of Laplacian systems), solving linear systems that are constant spectral approximations of Laplacians (or more generally, SDD matrices), and recovering a spectral sparsifier of a graph using only a polylogarithmc number of matrix vector multiplies. More broadly this talk will show how to leverage a number of recent approaches to spectral sparsification towards expanding the robustness and scope of recent nearly linear time linear system solving research, and providing general matrix recovery machinery that may serve as a stepping stone for faster algorithms. This talk reflects joint work with Arun Jambulapati and Kiran Shiragur.

Spatial Scan Statistics measure and detect anomalous spatial behavior, specifically they identify geometric regions where significantly more of a measured characteristic is found than would be expected from the background distribution. These techniques have been used widely in geographic information science, such as to pinpoint disease outbreaks. However, until recently, available algorithms and software only scaled to at most a thousand or so spatial records. In this work I will describe how using coresets, efficient constructions, and scanning algorithms, we have developed new algorithms and software that easily scales to millions or more data points. Along the way we provide new efficient algorithms and constructions for eps-samples and eps-nets for various geometric range spaces. This is a case where subtle theoretical improvements of old structures from discrete geometry actually result in substantial empirical improvements.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !