## Wednesday, October 02, 2013

### Videos: Succinct Data Representations and Applications

On September 16th, the Simons Institute at Berkeley hosted another awesome meeting: Succinct Data Representations and Applications. They put out the videos and some slides, enjoy! If you read the blog often, you will notice several items already mentioned here.

Monday, September 16th, 20138:00 am – 8:15 am
Coffee and Check-In

8:15 am – 8:30 am
Opening Remarks

8:30 am – 9:15 am

An algorithm runs in "input-sparsity time" if the running time in the RAM model is bounded above by a constant times the size of the input, for arbitrary or worst-case input. Somewhat remarkably, recent work in Randomized Numerical Linear Algebra has shown that fundamental problems such as least-squares and low-rank approximation can be solved to epsilon-accuracy in input-sparsity time. I will describe several related results in constructing low-distortion embeddings for Lp spaces in input-sparsity time, as well as the application of these results to solving problems such as least-squares, least-absolute deviations, and quantile regression. Of particular importance in making these theoretical ideas practical is understanding the tradeoffs between the running time to construct the embedding and the distortion quality of the embedding.

Linear-time algorithms have long been considered the gold standard of computational efficiency. Indeed, it is hard to imagine doing better than that, since for a nontrivial problem, any algorithm must consider all of the input in order to make a decision. However, as extremely large data sets are pervasive, it is natural to wonder what one can do in {em sub-linear} time. Over the past two decades, several surprising advances have been made on designing such algorithms. We will give a non-exhaustive survey of this emerging area, highlighting recent progress and directions for further research.

The introduction of randomization in the design and analysis of algorithms for matrix computations (such as matrix multiplication, least-squares regression, the Singular Value Decomposition (SVD), etc.) over the last decade provided a new paradigm and a complementary perspective to traditional numerical linear algebra approaches. These novel approaches were motivated by technological developments in many areas of scientific research that permit the automatic generation of large data sets, which are often modeled as matrices.

In this talk we will outline how such approaches can be used to approximate problems ranging from matrix multiplication and the Singular Value Decomposition (SVD) of matrices to approximately solving least-squares problems and systems of linear equations. Application of such methods to data analysis will also be discussed.

A sketch of a matrix $A$ is another matrix $B$ which is significantly smaller than $A$, but still approximates it well.

Finding such sketches efficiently is an important building block in modern algorithms for approximating, for example, thePCA of massive matrices. This task is made more challenging in the streaming model, where each row of the input matrix can be processed only once and storage is severely limited.

In this paper, we adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives $n$ rows of a large matrix $A in R^{n imes m}$ one after the other, in a streaming fashion. It maintains a sketch $B in R^ { ell imes m}$ containing only $ell ll n$ rows but still guarantees that $A^TA pproxB^TB$. More accurately,

$orall x, |x|=1 0 le |Ax|^2 - |Bx|^2 le 2|A|_{f}^2/ell$

This algorithm's error decays proportionally to $1/ell$ using $O(m ell)$ space. In comparison, random-projection, hashing or sampling based algorithms produce convergence bounds proportional to $1/sqrt{ell}$. Sketch updates per row in $A$ require amortized $O(mell)$ operations and the algorithm is perfectly parallelizable. Our experiments corroborate the algorithm's scalability and improved convergence rate. The presented algorithm also stands out in that it is deterministic, simple to implement, and elementary to prove.

12:00 pm – 1:30 pm
Lunch Break

1:30 pm – 2:15 pm
Optimizing with Tensors
Ravi Kannan, Microsoft Research India

Constraint Satisfaction Problems (CSP) can be formulated as tensor optimization problems. So also the interesting planted clique problem. Much about tensors is known to be hard or impossible, But the focus of this talk will be the possible. In particular, we show that the spectral norm of any r-dimensional array (for r fixed) can be approximated to additive error $\varepsilon$ times its Frobenius norm and this is sufficient to solve dense instances of MAX-CSP's as well as a more general class.

Joint work with Frieze, de la Vega, Karpinski, Vempala.

Optimizing with Tensors (slides)

2:30 pm – 4:00 pm
Reception at Calvin Lab

Tuesday, September 17th, 20138:30 am – 9:00 am
Coffee and Check-In

9:00 am – 9:45 am
Homomorphic Sketches: Shrinking Big Data without Sacrificing Structure
Andrew McGregor, University of Massachusetts

Fingerprinting is a widely-used technique for efficiently verifying that two large files are identical. More generally, sketching is a form of lossy compression based on random linear projections that, for example, could be used to estimate the Hamming distance between the files. Both techniques are particularly useful for monitoring data streams or minimizing communication in distributed and parallel computing. Many applications have been studied extensively including sparse signal recovery, locality sensitive hashing, and linear regression.

Our goal is to extend these popular techniques and design random projections that preserve combinatorial and group-theoretic structure in large data sets. In this talk, we present recent results on analyzing spectral graph properties and fingerprinting misaligned text. Both results rely on developing homomorphic sketches that support operations on the compressed data that correspond to operations on the uncompressed data. Applications include the first sublinear space algorithms for processing dynamic graph streams.

Includes joint work with Kook Jin Ahn, Alexandr Andoni, Assaf Goldberger, Sudipto Guha, and Ely Porat.

We build long chains of inference based on the results of data mining. But how do we certify the results of these inference so that we can say things like "there's a 51% chance you're a US person?" And more importantly, how can we generate compact certificates to give to users to validate labels being assigned to them?

Formally, suppose I'm given a clustering of a set of points into groups. How do I assign a score (or set of scores) to a point that captures the strength of its assignment to a group? In this talk I'll describe a method that draws on Voronoi-based interpolation techniques to define such a score, and uses epsilon-samples and convex body sampling to compute the score efficiently. I'll then illustrate the use of such a score with example applications.

This is joint work with Parasaran Raman.

Linear sketches turn an n-dimensional input into a concise lower-dimensional representation via a linear transformation. In almost any realistic setting a linear sketch faces the possibility that its inputs are correlated with previous evaluations of the sketch. Known techniques no longer guarantee the correctness of the output in the presence of such correlations. We therefore ask: are linear sketches inherently non-robust to adaptively chosen inputs? We give a strong affirmative answer to this question. Specifically, we show that no linear sketch approximates the Euclidean norm of its input to within an arbitrary multiplicative approximation factor on a polynomial number of adaptively chosen inputs. The result remains true even if the dimension of the sketch is d = n - o(n) and the sketch is given unbounded computation time. Our result is based on an algorithm with running time polynomial in d that adaptively finds a distribution over inputs on which the sketch is incorrect with constant probability. Our result implies several corollaries for related problems including lp-norm estimation and l2/l2 compressed sensing in the presence of adaptively chosen inputs.

Joint work with Moritz Hardt.

Cloud computing suggests a model where communication between storage/computation devices is a bottleneck resource. An algorithm is nimble if it uses only sublinear (ideally polylogarithmic) communication and polynomial time and space. We expand on the model proposed by Cormode and Muthukrishnan and develop nimble algorithms for computing frequency moments, counting homomorphisms, low-rank approximation and k-means clustering.

This is joint work with Ravi Kannan and David Woodruff.

Row/column sampling techniques were initially introduced to compute low-rank matrix approximation faster than the computation of SVD. In this talk, I will give a quick overview of squared-length sampling, adaptive sampling, volume sampling, relevance-score sampling etc., survey their connections to other topics such as determinantal point processes, rounding Lasserre SDP solutions, graph sparsification, convex geometry, and discuss interesting open problems in these.

Many important properties of an undirected graph manifest themselves spectrally in the eigenvalues or quadratic forms of matrices related to the graph. For instance, the connectivity structure, electrical properties, and random walk behavior of a graph are determined by its Laplacian matrix. A spectral sparsifier of a graph G is a sparse graph H on the same set of vertices such that the Laplacians of H and G are close, so that H captures the spectral behavior of G while being much cheaper to store and perform computations on. We survey a line of work showing that a nearly linear size spectral sparsifier exists for every graph and can be computed in the one pass streaming model with small update time per edge, and discuss possible improvements which could make this algorithm more useful in practice.

3:30 pm – 4:00 pm
Break

4:00 pm – 4:45 pm
Fast Testing of Graph Properties
Artur Czumaj, University of Warwick

We review recent results for sublinear-time testing of fundamental graph properties in sparse graphs.

We consider the problem in which an input graph G is represented as an adjacency matrix and we're asking a question whether G has a given property P or is "far" from any graph satisfying P. The central question in this area is which properties can be tested very quickly, in particular, in constant-time, independently of the size of the input graph.

We will present recent advances in this area, focusing on planar graphs and graphs with poor expansion properties.

This is talk is based on joint papers with Christian Sohler and others.

4:45 pm – 5:30 pm
Inferring Structural Properties: Graphs and Bridges
Anna Gilbert, University of Michigan

I will discuss two recent projects that strive to measure less but infer some or more about buildings/bridges and graphs. In the first project, our goal is to collect data about buildings and bridges for efficient, effective structural health monitoring. In the second project, our goal is to collect boundary measurements on a graph and to infer its "material" properties, using a mathematical generalization of optical tomography.

Wednesday, September 18th, 20138:30 am – 9:00 am
Coffee and Check-In

9:00 am – 9:45 am
Convex Relaxations for Permutation Problems
Alexandre d'Aspremont, CNRS - ENS Paris

Seriation seeks to reconstruct a linear order between variables using unsorted similarity information. It has direct applications in archeology and shotgun gene sequencing for example. We prove the equivalence between the seriation and the combinatorial 2-sum problem (a quadratic minimization problem over permutations) over a class of similarity matrices. The seriation problem can be solved exactly by a spectral algorithm in the noiseless case and we produce a convex relaxation for the 2-sum problem to improve the robustness of solutions in a noisy setting. This relaxation also allows us to impose additional structural constraints on the solution, to solve semi-supervised seriation problems. We present numerical experiments on archeological data, Markov chains and gene sequences.

Modern massive datasets create a fundamental problem at the intersection of the computational and statistical sciences: how to provide guarantees on the quality of statistical inference given bounds on computational resources such as time or space. Our approach to this problem is to define a notion of “algorithmic weakening,” in which a hierarchy of algorithms is ordered by both computational efﬁciency and statistical efﬁciency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing. We illustrate this approach in the setting of denoising problems, using convex relaxation as the core inferential tool. Hierarchies of convex relaxations have been widely used in theoretical computer science to yield tractable approximation algorithms to many computationally intractable tasks. In this talk we show how to endow such hierarchies with a statistical characterization and thereby obtain concrete tradeoffs relating algorithmic runtime to amount of data.

Joint work with Michael Jordan.

We consider streaming, one-pass principal component analysis (PCA), in the high-dimensional regime, with limited memory. Here, $p$-dimensional samples are presented sequentially, and the goal is to produce the $k$-dimensional subspace that best approximates these points. Standard algorithms require $O(p^2)$ memory; meanwhile no algorithm can do better than $O(kp)$ memory, since this is what the output itself requires. Memory (or storage) complexity is most meaningful when understood in the context of computational and sample complexity. Sample complexity for high-dimensional PCA is typically studied in the setting of the {em spiked covariance model}, where $p$-dimensional points are generated from a population covariance equal to the identity (white noise) plus a low-dimensional perturbation (the spike) which is the signal to be recovered. It is now well-understood that the spi ke can be recovered when the number of samples, $n$, scales proportionally with the dimension, $p$. Yet, all algorithms that provably achieve this, have memory complexity $O(p^2)$. Meanwhile, algorithms with memory-complexity $O(kp)$ do not have provable bounds on sample complexity comparable to $p$. We present an algorithm that achieves both: it uses $O(kp)$ memory (meaning storage of any kind) and is able to compute the $k$-dimensional spike with $O(p log p)$ sample-complexity.

Models or signals exhibiting low dimensional behavior (e.g., sparse signals) play an important role in signal processing and machine learning. In this talk, we focus on models that have multiple structures simultaneously; e.g., matrices that are both low rank and sparse, arising in phase retrieval, quadratic compressed sensing, and sparse PCA. We consider estimating such models from observations corrupted by additive Gaussian noise.

We provide tight upper and lower bounds on the mean squared error (MSE) of a convex denoising program that uses a combination of regularizers. In the case of low rank and sparse matrices, we quantify the gap between the MSE of the convex program and the best achievable error, and present a simple (nonconvex) thresholding algorithm that outperforms its convex counterpart and achieves almost optimal MSE.