Friday, July 12, 2013

SPARC 2013 LIst of abstracts and Workshop on Theoretical Aspects of Big Data at CUHK abstracts

SPARC 2013 will take place on August 5-7, 2013 at University of Michigan, Ann Arbor. The theme this year is:

Efficient and effective transmission, storage, and retrieval of information on a large-scale are among the core technical problems in the modern digital revolution. The massive volume of data necessitates the quest for mathematical and algorithmic methods for efficiently describing, summarizing, synthesizing, and, increasingly more critical, deciding when and how to discard data before storing or transmitting it. Such methods have been developed in two areas: coding theory, and sparse approximation (SA) (and its variants called compressive sensing (CS) and streaming algorithms). Coding theory and computational complexity are both well established fields that enjoy fruitful interactions with one another. On the other hand, while significant progress on the SA/CS problem has been made, much of that progress is concentrated on the feasibility of the problems, including a number of algorithmic innovations that leverage coding theory techniques, but a systematic computational complexity treatment of these problems is sorely lacking. The workshop organizers aim to develop a general computational theory of SA and CS (as well as related areas such as group testing) and its relationship to coding theory. This goal can be achieved only by bringing together researchers from a variety of areas. We will have several tutorial lectures that will be directed to graduate students and postdocs.

Talks abstracts

Mary Wootters 
High dimensional probability in theoretical computer science

Abstract Recently, techniques from probability in Banach spaces and geometric functional analysis have found applications in compressed sensing, coding theory, and other areas of theoretical computer science. In this tutorial, I will go over some useful tricks and techniques, with examples.

Swastik Kopparty 
Polynomials and Complexity Theory

Abstract This tutorial will discuss some of the amazing and unexpected applications of polynomials in complexity theory.

Jin-Yi Cai 
Dichotomy for Counting Problems

Abstract Emil Post initiated the study of whether there are problems whose Turing degrees of computability are stirctly between Decidable and the Halting Problem. The famous solution came independently from Friedberg and Muchnik, who invented the finite injury priority argument. I would like to revisit Post's Problem, in the context of the complexity of counting problems. I will survey a number of recent dichotomy theorems which have the general flavor that, in a broad class of counting problems within #P, there are really only two levels of complexity: one is tractable, the other is complete. This is philosophically an anti-Friedberg-Muchnik theory, and the frameworks considered include (1) Graph Homomorphisms,(2) Counting, and (3) Holant Problems.

Jelani Nelson 
OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

Abstract Let E be a d-dimensional linear subspace of R^n. A subspace embedding for E is a linear map that preserves the l_2 norms of all vectors in E up to 1+eps. We improve a recent result of [Clarkson-Woodruff, STOC'13] and with a much simpler proof: i.e. we show that the Thorup-Zhang sketch (a random sign matrix with one non-zero per column) is a subspace embedding with good probability when the number of rows is O(d^2/eps^2). Once the theorem is formulated the right way, the proof is a simple second moment computation. We then show our main result: that when one has m rows and s non-zeroes per column (the sparse Johnson-Lindenstrauss matrices of [Kane-Nelson, SODA'12]), one can take m = O(d^{1.01}/eps^2), s = O(1/eps). These are the first subspace embeddings to have m = o(d^2) with s = o(d). Our bounds imply improved running times for several numerical linear algebra problems, including approximating leverage scores, least squares regression, l_p regression for p in [1, infty), and low-rank approximation.

Andrew McGregor 
Homomorphic Sketches: Shrinking Big Data without Sacrificing Structure

Abstract Fingerprinting is a widely-used technique for efficiently verifying that two large files are identical. More generally, sketching is a form of lossy compression based on random linear projections that, for example, could be used to estimate the Hamming distance between the files. Both techniques are particularly useful for monitoring data streams or minimizing communication in distributed and parallel computing. Many applications have been studied extensively including sparse signal recovery, locality sensitive hashing, and linear regression. Our goal is to extend these popular techniques and design random projections that preserve combinatorial and group-theoretic structure in large data sets. In this talk, we present recent results on analyzing spectral graph properties and fingerprinting misaligned text. Both results rely on developing homomorphic sketches that support operations on the compressed data that correspond to operations on the uncompressed data. Applications include the first sublinear space algorithms for processing dynamic graph streams. Includes joint work with Kook Jin Ahn, Alexandr Andoni, Assaf Goldberger, Sudipto Guha, and Ely Porat.

Poster Abstracts
Zhenming Liu On Continuous Counting and Learning in a Distributed System

Abstract Consider a distributed system that consists of a coordinator node connected to multiple sites. Items from a data stream arrive to the system one by one, and are arbitrarily distributed to different sites. The goal of the system is to continuously track a function of the items received so far within a prescribed relative accuracy and at the lowest possible communication cost. This class of problems is called a continual distributed stream monitoring.

In this talk we will focus on two problems from this class. We will first discuss the count tracking problem (counter), which is an important building block for other more complex algorithms. The goal of the counter is to keep a track of the sum of all the items from the stream at all times. We show that for a class of input loads a randomized algorithm guarantees to track the count accurately with high probability and has the expected communication cost that is sublinear in both data size and the number of sites. We also establish matching lower bounds. We then illustrate how our non-monotonic counter can be applied to solve more complex problems, such as to track the second frequency moment and the Bayesian linear regression of the input stream.

We will next discuss the online non-stochastic experts problem in the continual distributed setting. Here, at each time-step, one of the sites has to pick one expert from the set of experts, and then the same site receives information about payoffs of all experts for that round. The goal of the distributed system is to minimize regret with respect to the optimal choice in hindsight, while simultaneously keeping communication to the minimum. This problem is well understood in the centralized setting, but the communication trade-off in the distributed setting is unknown. The two extreme solutions to this problem are to communicate with everyone after each payoff, and not to communicate at all. We will discuss how to achieve the trade-off between these two approaches. We will present an algorithm that achieves a non-trivial trade-off and show the difficulties of further improving its performance.
Brief bio Z. Liu's research interests are distributed streaming algorithms and stochastic processes. He is a postdoctoral research associate at Princeton University in the Computer Science Department.
Yang Liu Quantum and randomized communication complexity of XOR functions in the SMP model

Abstract Quantum and randomized communication complexity of XOR functions in the SMP model.
Brief bio Y. Liu's research interests include communication complexity, streaming algorithms, sketching, and coding theory. Yang is a graduate student at the Chinese University of Hong Kong.
Jesse Hartloff Combining traitor tracing and revocation in a single broadcast encryption scheme

Abstract Combining traitor tracing and revocation in a single broadcast encryption scheme (joint with Jimmy Dobler).
Brief bio J. Hartloff's research interests include coding, algorithms, biometrics, and cryptography. He is a graduate student at University at Buffalo in the Computer Science Department.
Mert Saglam On the communication complexity of sparse set disjointness and exists-equal problems

Abstract Improving on the works of Hastad and Widgerson from 1997, we give a new protocol for the two player communication problem set disjointness. The protocol determines if the sets players receive are disjoint by communicating O(k log^{(r)} k) bits over r rounds. Our main contribution is a matching lower bound: we show this protocol is optimal in a very strong sense. The core of our proof is an analysis of error-correction properties of an adversarially chosen large subset of [t]^n, i.e., the set of n dimensional vectors over alphabet [1..t]
Brief bio M. Saglam's research interests include Streaming algorithms, Small space computation, Communication complexity. He is a graduate student at the University of Washington in the Computer Science Department.
Keith Levin Spoken term discovery

Abstract The task of automatically identifying words or phrases in speech data, requires that we search a large amount of audio for repeated acoustic patterns. Previous work used randomized hashing and approximate nearest neighbor search to enable fast audio search over hundreds of hours of speech data without the need for statistical acoustic or language models. We explore ways to extend this approach so that our randomized hashing functions capture linguistic information such as phonetic context, in the hope that such information improves both the speed and accuracy of audio search.

Workshop on Theoretical Aspects of Big Data took place in Hong Kong, this past July 8-9, 2013. From the page:

This workshop aims to bring together various researchers working on the development of theoretical principles related to study of “big data” phenomenon. There has been an explosion in data sizes being encountered in science, engineering, and technology recently. There is a need for good theoretical principles so as to be able to develop scalable learning and inference algorithms to handle the massive data streams. It is towards this end that the workshop would like to bring together researchers to a common forum for a free exchange of ideas and develop future research directions.
Invited Speakers:

No comments: