Pages

Friday, August 03, 2012

Coding, Complexity, and Sparsity Workshop Slides

The Coding, Complexity, and Sparsity Workshop st UMich ended yesterday.  Thanks to Anna Gilbert and the other organizers (S. MuthukrishnanHung NgoEly PoratAtri RudraMartin Strauss) for putting up some of the slides of the meetingm here us the abstract and the slides (when they exist)




This talk centers around some audacious conjectures that attempt to forge firm links between computational complexity classes and the study of Kolmogorov complexity.



More specifically, let R denote the set of Kolmogorov-random strings.  Two familiar complexity classes are:

* BPP: the class of problems that can be solved with negligible error in polynomial-time, and

* NEXP: the class of problems solvable in nondeterministic exponential time.



Conjecture 1: NEXP = NP^R.

Conjecture 2: BPP is the class of problems non-adaptively polynomial-time reducible to R.



The first thing to note is that these conjectures are not only audacious; they are obviously false!  R is not a decidable set, and thus it is absurd to suggest that the class of problems reducible to it constitutes a complexity class.



The absurdity fades if, for example, we interpret ``NP^R'' to be ``the class of problems that are NP-Turing reducible to R, no matter which universal machine we use in defining Kolmogorov complexity''.  The lecture will survey the body of work (some of it quite recent) that suggests that, when interpreted properly, the conjectures may actually be true.



Eric Allender received his Ph.D. from the Georgia Institute of Technology in

1985.  He has been at Rutgers University since then, serving as department

chair from 2006 to 2009.  He is a Fellow of the ACM, is Editor-in-Chief of
ACM Transactions on Computation Theory, and serves on the editorial
boards of Computational Complexity, Computability, and The Chicago Journal of
Theoretical Computer Science.

Mark BravermanAn introduction to information complexity


The (two party) information complexity of a function F(X,Y) measures the least amount of

information that Alice who holds X and Bob who holds Y need to reveal to each other

about their inputs in order to compute F. While this complexity measure is interesting
in its own right -- for example in the context of information theoretic security, it is also
very useful in studying the communication complexity of problems. We will introduce
information complexity, and discuss some of its properties and connections to communication
complexity. No prior background will be assumed.



Mark Braverman is an assistant professor of computer science at Princeton University.

His interests lie in complexity theory, the theory of real computation, machine learning, algorithms, game theory, and

applications of computer science in healthcare and medicine.

For more material upon which Mark's tutorial is based, please see these references



Over the last ten years, plenty of ink has been spilled on a single communication problem called Gap-Hamming-Distance (GHD).  In this problem, Alice and Bob receive n-bit strings and must decide whether they are "close" (Hamming distance <= n/2 - sqrt(n)) or "far" (distance \le n/2 + sqrt(n)). How many bits must they communicate to solve this problem, with at most a 1% error?

In this talk, we will see how this problem arose and gained importance for its applications to space lower bounds for data stream algorithms. We shall then follow the journey from the early results on GHD, to stronger intermediate results obtained through a deeper understanding of the problem, and finally to three recent proofs of an optimal Omega(n) lower bound. As we proceed along this journey, we shall encounter successively deeper geometric ideas, many being used for the first time
in the context of communication complexity.

We will end with an intellectual puzzle on GHD that remains unsolved.

Amit Chakrabarti is an Associate Professor of Computer Science at Dartmouth. He received a Ph.D. in Computer Science from Princeton University in 2002, and a B.Tech. in Computer Science, along with the President of India Gold Medal, from the Indian Institute of Technology, Bombay, in 1997.

Prof. Chakrabarti has broad interests in theoretical computer science, and specializes in communication complexity and its applications, the theory of data stream algorithms, and approximation algorithms for combinatorial optimization. He is the recipient of an NSF CAREER Award and a Karen Wetterhahn Fellowship. For more information, please visit http://www.cs.dartmouth.edu/~ac
Madhi CheraghchiRestricted Isometry of Fourier Matrices and List Decodability of Random Linear Codes

We prove that a random linear code over $\mathbb{F}_q$, with probability arbitrarily close to $1$, is list decodable at radius $1-1/q-\varepsilon$ with list size $L=O(1/\varepsilon^2)$ and rate  R=\Omega_q(\varepsilon^2/(\log^3(1/\varepsilon)))$. Up to the polylogarithmic factor in $1/\varepsilon$ and constant factors depending on $q$, this matches the lower bound $L=\Omega_q(1/\varepsilon^2)$ for the list size and upper bound $R=O_q(\varepsilon^2)$ for the rate. Previously only existence (and not abundance) of such codes was known for the special case $q=2$ (Guruswami, H{\aa}stad, Sudan and Zuckerman, 2002).

In order to obtain our result, we employ a relaxed version of the well known Johnson bound on list decoding that translates the \emph{average} Hamming distance between codewords to list decoding guarantees. We furthermore prove that the desired average-distance guarantees hold for a code provided that a natural complex matrix encoding the codewords satisfies the Restricted Isometry Property with respect to the Euclidean norm (RIP-2). For the case of random binary linear codes, this matrix coincides with a random submatrix of the Hadamard-Walsh transform matrix that is well studied in the compressed sensing literature.

Finally, we improve the analysis of Rudelson and Vershynin (2008) on the number of random frequency samples required for exact reconstruction of $k$-sparse signals of length $N$. Specifically, we improve the number of samples from $O(k \log(N) \log^2(k) (\log k + \log\log N))$ to $O(k \log(N) \log^3(k))$. The proof involves bounding the expected supremum of a related Gaussian process by using an improved analysis of the metric defined by the process. This improvement is crucial for our application in list decoding. 
       
Joint work with Venkatesan Guruswami and Ameya Velingker.

Mahdi Cheraghchi is a post-doctoral researcher at the Computer Science Department of Carnegie Mellon University, Pittsburgh, PA, since 2011. Prior to that, he spent a year at the Department of Computer Science, University of Texas at Austin, TX, working with Prof. David Zuckerman. He received M.Sc. and Ph.D. degrees in computer science from École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, in 2005 and 2010, respectively, working under supervision of Prof. Amin Shokrollahi. In 2004, he received his B.Sc. degree in computer engineering from Sharif University of Technology, Tehran, Iran. He has broad interests in theoretical computer science, which in particular include the interconnections between coding theory and theoretical computer science, derandomization theory and explicit constructions, probabilistically checkable proofs and inapproximability.



Anna Gal, Error-correcting codes vs. superconcentrator graphs


We prove tight bounds on the number of wires in constant depth (unbounded fan-in) circuits computing asymptotically good error-correcting codes. We show that quasilinear number of wires is sufficient for computing good codes already in depth 2 circuits with parity gates. This implies that a (necessarily dense) generator matrix for the code can be written as the product of two sparse matrices. For depth 2, our Omega(n (log n/log log n)2) lower bound gives the largest known lower bound for computing any linear map. Furthermore, we identify a new class of superconcentrator-like graphs with connectivity properties distinct from previously studied ones. We show that any circuit computing good codes must satisfy such superconcentrator-like properties. Joint work with Kristoffer Hansen, Michal Koucky, Pavel Pudlak and Emanuele Viola. Anna Gal received her Ph.D. in Computer Science at the University of Chicago in 1995. She was a postdoc at the Institute for Advanced Study in Princeton, and at Princeton University. She is a faculty member of the Computer Science Department at the University of Texas at Austin since 1997. She is a recipient of an NSF CAREER Award and an Alfred P. Sloan Research Fellowship. She received the Machtey Award for Best Student Paper at FOCS 1991, and the EATCS Best Paper Award at Track A of ICALP 2003. She is an Associate Editor of the ACM Transactions on Computation Theory.

Piotr Indyk, Tutorial on compressive sensing/streaming algorithms

See above.

Piotr Indyk is a Professor of Electrical Engineering and Computer Science at MIT. He joined MIT in 2000, after earning PhD from Stanford University. Earlier, he received Magister degree from Uniwersytet Warszawski in 1995.

His research interests include high-dimensional computational geometry,
sketching and streaming algorithms, sparse recovery and compressive sensing. He is a recipient of the NSF CAREER Award, Sloan Fellowship, Packard Fellowship and Technology Review TR10. He is an Associate Editor for the IEEE Transactions on Signal Processing and SIAM Journal on Computing.





Multiplicity Codes are certain natural error-correcting codes based on evaluations of polynomials and their derivatives. These codes have rate approaching 1, yet they allow for sublinear time decoding from errors. Furthermore, they admit some powerful list-decoding algorithms, achieving the so-called list-decoding capacity (mimicking the behavior of the Folded Reed-Solomon Codes of Guruswami and Rudra). I will talk about these codes and their decoding algorithms.



Part of this talk is based on joint work with Shubhangi Saraf and Sergey Yekhanin.




Richard LiptonThe Hartmanis Stearns Conjecture: Complexity of numbers---two views




We present an iterative approach to constructing pseudorandom generators, based on the repeated application of mild pseudorandom restrictions. We use this template to construct pseudorandom generators for combinatorial rectangles and read-once CNFs and a hitting set generator for width-3 branching programs that achieve near-optimal seed-length even in the low error regime. The (pseudo)random restrictions we use are milder than those typically used for proving circuit lower bounds, in that we only set a constant fraction of the bits at a time. While such restrictions do not simplify the functions drastically, we show that they can be derandomized using small-bias spaces.
Based on joint work with Parikshit Gopalan, Omer Reingold, Luca Trevisan and Salil Vadhan.

Raghu Meka is currently a postdoctoral member at the Institute for Advanced Study at Princeton. He received his PhD from the department of computer science at the University of Texas at Austin in 2011. He is a recipient of the Bert Kay best dissertation award at UT Austin, the Dean's Excellence award and an MCD fellowship at UT Austin. Prior to joining UT Austin, he completed his B.Tech in computer science from Indian Institute of Technology, Chennai, India. His main interests are in complexity theory, pseudorandomness, learning theory and data mining.


Andrew McGregor, Graph Sketching



We present a sequence of algorithmic results for analyzing graphs via random linear projections or ``sketches". We start with results for evaluating basic connectivity and k-connectivity and then use these primitives to construct combinatorial sparsifiers that allow every cut to be approximated up to a factor(1+\epsilon). Our results have numerous applications including single-pass, semi-streaming algorithms for constructing sparsifiers in fully-dynamic graph streams where edges can be added and deleted in the underlying graph.



This is joint work work with Kook Jin Ahn and Sudipto Guha.



Andrew McGregor is an Assistant Professor at the University of Massachusetts, Amherst. He received a B.A. degree and the Certificate of Advance Study in Mathematics from the University of Cambridge and a Ph.D. from the University of Pennsylvania. He did post-doctorial research at UC San Diego and Microsoft Research SVC. He is interested in many areas of theoretical computer science and specializes in data stream algorithms, linear sketching, and communication complexity. He received the NSF Career Award in 2010 and Lilly Teaching Fellowship in 2012.


Eric PriceImproved Concentration Bounds for Count-Sketch
We present a refined analysis of the classic Count-Sketch streaming heavy hitters algorithm [CCF02].  Count-Sketch uses O(k log n) linear measurements of a vector x in R^n to give an estimate x' of x.  The standard analysis shows that this estimate x' satisfies ||x'-x||_infty^2 < ||x_tail||_2^2 / k, where x_tail is the vector containing all but the largest k coordinates of x.  Our main result is that most of the coordinates of x' have substantially less error than this upper bound; namely, for any c < O(log n), we show that each coordinate i satisfies
 (x'_i - x_i)^2 < (c/log n) ||x_tail||_2^2/k
with probability 1-2^{-c}, as long as the hash functions are fully independent.  This subsumes the previous bound.
Our proof shows that any random variable with positive real Fourier transform and finite variance concentrates around 0 at least as well as a Gaussian.  This result, which may be of independent interest, gives good concentration even when the noise does not converge to a Gaussian.
Eric Price just finished his third year as a Ph.D. student with Piotr Indyk at MIT, working on sparse recovery problems.




Consider a setting in which inputs to and outputs from a computational 

problem are so large, that there is not time to read them in their 

entirety.   However, if one is only interested in small parts of the 

output at any given time, is it really necessary to solve the entire 
computational problem? Is it even necessary to view the whole input? 
We propose a model of {\em local computation algorithms} which for a 
given input, supports queries by a user to values of specified bits 
of a legal output.  The goal is to design local computation algorithms 
in such a way that very little of the input needs to be seen in order 
to determine the value of any single bit of the output.  In particular, 
the  runtime and space requirements  of the local computation algorithms 
should be sub-linear in the size of the input.  Surprisingly, there are 
nontrivial computational problems where such efficiency is achievable. 
Local computation algorithms are intended to distill the common features 
of several concepts that have previously appeared in various algorithmic 
subfields, including local distributed computation, local algorithms, 
locally decodable codes, and local reconstruction.  In this talk, we 
survey some of the recent problems, including maximal independent set 
and hypergraph coloring, for which polylogarthmic time and space local 
computation algorithms have been given. 



Joint work with Noga Alon, Gil Tamir, Shai Vardi and Ning Xie. 




Ronitt Rubinfeld  received her undergraduate degree at the University of 

Michigan and PhD at the University of California, Berkeley.    She held postdoctoral positions 

at DIMACS and the Hebrew University at Jerusalem.   After several years 
as a faculty member at Cornell University and a senior research scientist at NEC Research Institute, she is currently on the faculties 
of MIT and Tel Aviv University.   She has been a recipient of the ONR Young Investigator award, NSF Career 
Award, Sloan Fellowship, Cornell Association for Computer Science Undergraduates Faculty of the Year 
award and a Cornell College of Engineering Teaching award.   She was an invited speaker at the International Congress of Mathematicians in 2006. 
Her research focuses on sub-linear time algorithms for big datasets. 


Shubhangi SarafError correcting codes and applications to theoretical computer science 

In today's world there are huge amounts of data that need to get reliably stored or transmitted. However as hard as we try, some amount of noise or corruption is inevitable. Bits do get flipped and the data gets corrupted. However, we would still like to recover the original uncorrupted data. Is this even possible? This is precisely the task that error correcting codes accomplish. Error correcting codes have also found a host of striking applications in complexity theory and cryptography. In this talk I will discuss several examples of error correcting codes and we will see some of the applications to theoretical computer science. 


Shubhangi is a faculty member in the Mathematics and Computer Science departments at Rutgers University. She is broadly interested in all areas of theoretical computer science and discrete mathematics. Much of her research has focused on complexity theory and property testing. More specifically, she is interested in the complexity of arithmetic computation, the role of randomness in computing, and in sublinear-time algorithms for codes and other combinatorial structures. Prior to joining Rutgers, she completed her PhD in computer science at MIT in 2011 and a postdoc at the Institute for Advanced Study in 2012.





In a variety of settings, it is useful to view randomness as a resource to be conserved, and this perspective leads to the study of pseudorandomness, in which one tries to efficiently construct objects that "appear" random in specific ways, using little or no randomness.  


In this talk, I'll cover a variety of topics in this area, such as epsilon-bias spaces, k-wise independence, expander graphs, randomness extractors and condensers, and pseudorandom generators. I'll describe how these objects are constructed and give  examples of how they might be used.



No specialized background will be required.




Chris Umans received his Ph.D. in Computer Science from Berkeley in 2000. After spending two years as a postdoc in the Theory Group at Microsoft Research, he joined the Computer Science faculty at Caltech in 2002. His research interests encompass computational complexity, randomness in computation, and algebraic complexity and algorithms. He is the recipient of an NSF CAREER award, an Alfred P. Sloan Research Fellowship, and several best-paper and teaching awards. He is a managing editor of Theory of Computing.


David WoodruffLow Rank Approximation and Regression in Input Sparsity Time 



We improve the running times of algorithms for least squares regression and low-rank approximation to account for the sparsity of the input matrix. Namely, if nnz(A) denotes the number of non-zero entries of an input matrix A: 

- we show how to solve approximate least squares regression given an n x d matrix A in nnz(A) + poly(d log n) time 

- we show how to find an approximate best rank-k approximation of an n x n matrix in nnz(A) + n*poly(k log n) time 



All approximations are relative error. Previous algorithms based on fast Johnson-Lindenstrauss transforms took at least ndlog d or nnz(A)*k time. We have implemented our algorithms, and preliminary results suggest the algorithms are competitive in practice. 



Joint work with Ken Clarkson.

David Woodruff joined the theory group at IBM Almaden as a research scientist after completing his Ph.D. in 2007 at MIT in theoretical computer science. 



Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

2 comments:

  1. The " Tutorial on compressive sensing/streaming algorithms" document is missing.
    Thanks for the workshop presentation links.

    ReplyDelete
  2. It's fixed. Thanks for pointing this out.

    Cheers,

    Igor.

    ReplyDelete