Saturday, May 31, 2014

Nuit Blanche in Review (May 2014)

Since the last Nuit Blanche in Review (April 2014), Nuit Blanche crossed the 1 million visits (and it is about to get close to 3 million page views). Yesterday's entry on the slides of the Sublinear Algorithms 2014 meeting still resonates with me, but this past month, I also created a meetup around note by note cooking ( Connections between Note by Note Cooking and Machine Learning ). The Advanced Matrix Factorization Jungle page added some factorizations found in Graph Matching and Archetypal Analysis. We also wondered how FROG could be addressed with Nonlinear Compressive Sensing, saw the growth of Multiple Regularizers,  but also saw the possibility of one-bit compressive sensing to provide some way for understanding neural networks or how real Neurons could be thought as a Signal Processing Device. Sharp phase transition such as Donoho-Tanner was also seen as a way to probe good from bad neural networks in Compressive Sensing and Short Term Memory / Visual Nonclassical Receptive Field Effects or provide Sharp Performance Bounds for Graph Clustering . We were also made aware of the connection how the theory of convex optimization influences Machine Learning or the geometric perspective of the problem of the estimation in high dimensions, we also had an overview of Sparsity-Aware Learning and Compressed Sensing

We also had quite a few implementations made available.

Sunday Morning Insights:
Experiment /Hardware
Focused Entries:

Meetings / Meetups :


Hamming's time

Saturday Morning Videos

Credit: ESA/NASA, Soho, LASCO 2

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, May 30, 2014

Abstracts and slides: Sublinear Algorithms 2014

The Bertinoro Workshop on Sublinear Algorithms just ended, here are the Abstracts and slides from the program page, I also added some preprints:

David Woodruff, IBM Almaden
Turnstile Streaming Algorithms Might as Well Be Linear Sketches [slides (ppt)] Preprint

In the turnstile model of data streams, an underlying vector x in {-m, -m+1,..., m-1, m} is presented as a long sequence of arbitrary positive and negative integer updates to its coordinates. A randomized algorithm seeks to approximate a function f(x) with constant probability while only making a single pass over this sequence of updates and using a small amount of space. All known algorithms in this model are linear sketches: they sample a matrix A from a distribution on integer matrices in the preprocessing phase, and maintain the linear sketch Ax while processing the stream. At the end of the stream, they output an arbitrary function of Ax. One cannot help but ask: are linear sketches universal?
In this work we answer this question by showing that any 1-pass constant probability streaming algorithm for approximating an arbitrary function f of x in the turnstile model can also be implemented by sampling a matrix A from the uniform distribution on O(n log m) integer matrices, with entries of magnitude poly(n), and maintaining the linear sketch Ax. Furthermore, the logarithm of the number of possible states of Ax, as x ranges over {-m, -m+1, ..., m}^n, plus the amount of randomness needed to store A, is at most a logarithmic factor larger than the space required of the space-optimal algorithm. Our result shows that to prove space lower bounds for 1-pass streaming algorithms, it suffices to prove lower bounds in the simultaneous model of communication complexity, rather than the stronger 1-way model. Moreover, the fact that we can assume we have a linear sketch with polynomially-bounded entries further simplifies existing lower bounds, e.g., for frequency moments we present a simpler proof of the tilde{Omega}(n^{1-2/k}) bit complexity lower bound without using communication complexity.
Joint work with Yi Li and Huy L. Nguyen
Sofya Raskhodnikova, Penn State, Boston University, and Harvard University
L_p-Testing [slides (pdf)]
We initiate a systematic study of sublinear algorithms for approximately testing properties of real-valued data with respect to L_p distances. Such algorithms distinguish datasets which either have (or are close to having) a certain property from datasets which are far from having it with respect to L_p distance. For applications involving noisy real-valued data, using L_p distances allows algorithms to withstand noise of bounded L_p norm. While the classical property testing framework developed with respect to Hamming distance has been studied extensively, testing with respect to L_p distances has received little attention.
We use our framework to design simple and fast algorithms for classic problems, such as testing monotonicity, convexity and the Lipschitz property, and also distance approximation to monotonicity. In particular, for functions over the hypergrid domains [n]^d, the complexity of our algorithms for all these properties does not depend on the linear dimension n. This is impossible in the standard model.
Most of our algorithms require minimal assumptions on the choice of sampled data: either uniform or easily samplable random queries suffice. We also show connections between the L_p-testing model and the standard framework of property testing with respect to Hamming distance. Some of our results improve existing bounds for Hamming distance.
Joint work with Piotr Berman and Grigory Yaroslavtsev.
Robert Krauthgamer, Weizmann Institute of Science
The Sketching Complexity of Graph Cuts [slides (pptx)] Preprint
We study the problem of sketching an input graph, so that, given the sketch, one can estimate the value (capacity) of any cut in the graph up to $1+\epsilon$ approximation. Our results include both upper and lower bound on the sketch size, expressed in terms of the vertex-set size $n$ and the accuracy $\epsilon$.
We design a randomized scheme which, given $\epsilon\in(0,1)$ and an $n$-vertex graph $G=(V,E)$ with edge capacities,produces a sketch of size $\tilde O(n/\epsilon)$ bits,from which the capacity of any cut $(S,V\setminus S)$ can be reported,with high probability, within approximation factor $(1+\epsilon)$. The previous upper bound is $\tilde O(n/\epsilon^2)$ bits,which follows by storing a cut sparsifier graph as constructed by Benczur and Karger (1996) and followup work.
In contrast, we show that if a sketch succeeds in estimating the capacity of all cuts $(S,\bar S)$ in the graph (simultaneously), it must be of size $\Omega(n/\epsilon^2)$ bits.
[Joint work with Alexandr Andoni and David Woodruff.]
Michael Mahoney, UC Berkeley
Implicit regularization in sublinear approximation algorithms [slides (pdf)]
Most work in sublinear algorithms has adopted the following perspective: the data are too large to compute the quantity of interest, and so we develop an algorithm that is faster (in that it touches only a small amount of the data) than an exact computation of the quantity of interest; and then we "settle" for a solution that is only a little worse, for worst-case input, than the solution we would have found, had memory and/or running time not been an issue. Here, we will describe one such algorithm for finding locally-biased clusters in a large graph that goes beyond this traditional worst-case analysis in the sense that, in addition to coming with worst-case approximation guarantees of this form, it also has the interesting property that its output is "better" than the output of the more expensive exact algorithm---better by empirical metrics of interest to downstream users of this algorithm (who don't know or care about the theoretical guarantees). This tradeoff between solution quality (as measured by the objective function of purported interest) and solution niceness (as measured in some other way) is known as regularization; it is ubiquitous in machine learning, statistics, scientific computing, etc.; and it is usually implemented by adding some sort of norm constraint to the original objective. We will describe how approximate computation, in and of itself, can implicitly implement regularization; and we will describe how the sublinear push procedure for approximating a personalized PageRank vector implicitly optimizes an objective with a sparsity-inducing L1 regularization term added.

Ryan O’Donnell, Carnegie Mellon University
Testing Surface Area [slides (pps)] Preprint
Given ε, S, and black-box access to a completely arbitrary set F in R^n, we give an algorithm that tests whether F has surface area at most S or is ε-far from all sets of surface area at most (4/pi) S. Our test uses O(1/ε) queries, independent of the dimension n; previous work only treated the case of n = 1. We also mention a recent improvement due to Neeman which brings the 4/pi down to any number exceeding 1.
Joint work with Pravesh Kothari, Amir Nayyeri, and Chenggang Wu

Krzysztof Onak, IBM Research
A Near-Optimal Algorithm for Testing Isomorphism of Two Unknown Graphs [slides (pdf)]
Fischer and Matsliah (SICOMP 2008) showed that the number of queries necessary to test isomorphism of two unknown graphs in the dense graph model is at least Omega(n) and at most O(n^(5/4)). We essentially close this gap by showing an algorithm that makes n * 2^O(sqrt(log n)) queries.
One of the main tools in the previous work on the subject is the algorithm for testing if two unknown distributions are identical (Batu, Fortnow, Rubinfeld, Smith, White [JACM 2013]). A major obstacle in the quest for an efficient graph isomorphism tester turns out to be the Omega(n^(2/3)) distribution testing lower bound (Valiant [SICOMP 2011]). We bypass this lower bound by designing a more efficient algorithm that for two distributions on near-isomorphic sets of points is required to reject only if the Earth-Mover Distance between them is large.
Joint work with Xiaorui Sun.

Ely Porat, Bar Ilan University
Group Testing, Compressed Sensing and Algorithmic Applications [slides (pptx)]

Group testing is a long studied problem in combinatorics: A small set of r ill people must be identified out of the whole population of n people, by using only queries (tests) of the form “Does the set X contain an ill member?”. I will discuss the current state of the art, and show several surprising applications for group testing techniques (including pattern matching, crypto, and similarity sketching). 

Alexandr Andoni, Microsoft Research
Parallel Algorithms for Geometric Graph Problems ArXiV Preprint

Motivated by modern parallel computing models (think MapReduce), we give a new algorithmic framework for geometric graph problems. Our framework applies to problems such as the Minimum Spanning Tree (MST) problem over a set of points in a low-dimensional space, which is useful for hierarchical agglomerative clustering. Our algorithm computes a $(1+epsilon)$-approximate MST in a *constant* number of rounds of communication, while using total space and communication proportional to the size of the data only.
Our framework proves to have implications beyond the parallel models as well. For example, we consider the Earth-Mover Distance (EMD) problem, for which we obtain a new near-linear time algorithm as well as a first streaming algorithm (assuming we can pre-sort the points). Technically, our framework for EMD shows how to effectively break up a "big Linear Programming problem" into a number of "small Linear Programming problems", which can be later recombined using a dynamic programming approach. Our results also address open questions from [Sarathkumar-Agarwal, STOC'12], as well as a well-known open question on streaming EMD ( ).
Joint work with Aleksandar Nikolov, Krzysztof Onak, Grigory Yaroslavtsev.

Vladimir Braverman
Approximating Large Frequency Moments with $O(n^{1−2/k})$ Bits Preprint

We consider the problem of approximating frequency moments in the streaming model. Given a stream D = {p_1,p_2,...,p_m} of numbers from {1,...,n}, a frequency of i is defined as f_i = |{j: p_j = i}|. The k-th frequency moment of D is defined as F_k = \sum_{i=1}^n f_i^k.
In their celebrated paper, Alon, Matias, and Szegedy (STOC 1996) introduced the problem of computing a (1 +/- epsilon)-approximation of F_k with sublinear memory. We give upper bound of O(n^{1-2/k}) bits that matches, up to a constant factor, the lower bound of Woodruff and Zhang (STOC 12) for constant epsilon and k > 3.
Joint work with Jonathan Katzman, Charles Seidell and Gregory Vorsanger.

Amit Chakrabarti, Dartmouth College
Submodular Maximization in a Data Streaming Setting ArXiV Preprint
We study the problem of finding a maximum matching in a graph given by an input stream listing its edges in some arbitrary order, where the quantity to be maximized is given by a monotone submodular function on subsets of edges. This problem, which we call maximum submodular-function matching (MSM), is a natural generalization of maximum weight matching (MWM), which is in turn a generalization of maximum cardinality matching (MCM). We give two incomparable algorithms for this problem with space usage falling in the semi-streaming range---they store only $O(n)$ edges, using $O(n\log n)$ working memory---that achieve approximation ratios of $7.75$ in a single pass and $(3+\eps)$ in $O(\eps^{-3})$ passes respectively. The operations of these algorithms mimic those of Zelke's and McGregor's respective algorithms for MWM; the novelty lies in the analysis for the MSM setting. In fact we identify a general framework for MWM algorithms that allows this kind of adaptation to the broader setting of MSM. We also give generalizations of these results where the maximization is over ``independent sets'' in a very general sense. This generalization captures hypermatchings in hypergraphs as well as independence in the intersection of multiple matroids. Joint work with Sagar Kale.

Graham Cormode, University of Warwick
Parameterized Streaming Algorithms for Vertex Cover ArXiV preprint

As graphs continue to grow in size, we seek ways to effectively process such data at scale. The model of streaming graph processing, in which a compact summary is maintained as each edge insertion/deletion is observed, is an attractive one. However, few results are known for optimization problems over such dynamic graph streams. In this work, we introduce a new approach to handling graph streams, by instead seeking solutions for the parameterized versions of problems where we are given a parameter k and the objective is to decide whether there is a solution bounded by k. By combining kernelization techniques with randomized sketch structures, we obtain the first streaming algorithms for parameterized versions of Vertex Cover and Maximal Matching.
Joint work with Morteza Monemizadeh, Rajesh Chitnis, MohammadTaghi Hajiaghayi

Deeparnab Chakrabarty, Microsoft Research
Property Testing under Product Distributions ArXiV preprint

The primary problem in property testing is to decide whether a given function satisfies a certain property, or is far from any function satisfying it. This crucially requires a notion of distance between functions. The most prevalent notion is the Hamming distance over the {\em uniform} distribution on the domain. This restriction to uniformity is more a matter of convenience than of necessity, and it is important to investigate distances induced by more general distributions.
In this talk, I'll talk about property testing a rather general class of bounded derivative properties over arbitrary product distributions. Bounded derivative properties include fundamental properties such as monotonicity and Lipschitz continuity. Our results subsume almost all known results (upper and lower bounds) on monotonicity and Lipschitz testing.
We prove an intimate connection between bounded derivative property testing and binary search trees (BSTs). We exhibit a tester whose query complexity is the sum of expected depths of optimal BSTs for each marginal. Furthermore, we show this sum-of-depths is also a lower bound. A fundamental technical contribution of this work is an {\em optimal dimension reduction theorem} for all bounded derivative properties, which relates the distance of a function from the property to the distance of restrictions of the function to random lines. Such a theorem has been elusive even for monotonicity for the past 15 years, and our theorem is an exponential improvement to the previous best known result.
Joint work with Kashyap Dixit, Madhav Jha, and C. Seshadhri 

Venkat Chandrasekaran
Computational and Statistical Tradeoffs via Convex Relaxation ArXiV preprint

Shiri Chechik, Microsoft Research Silicon Valley
Approximate Distance Oracles with Constant Query Time ArXiV preprint

In recent years, the practical need for fast retrieval of shortest path queries has significantly increased, in part due to the developing GPS navigation systems and other route planning softwares.
Classical shortest paths algorithms such as Dijkstra's algorithm return shortest path distance in almost linear time in the number of nodes. However, continent-sized road networks require something much faster. It is possible to achieve sublinear-time queries if preprocessing is allowed.
A distance oracle is a data structure that allows fast retrieval of a distance estimate for any pair of nodes. The focus in designing distance oracles is often on the tradeoff between several parameters: the construction time (the time it takes to construct the distance oracle), the size of the data structure, the query time and the quality of the answer (how close is the estimated distance to the real distance).
I will discuss a recent work on distance oracles with constant query time, which essentially obtains optimal bounds for general graphs and improves over the Thorup-Zwick distance oracle [STOC ’01, J.ACM ‘05].

Artur Czumaj, University of Warwick
Testing Cluster Structure of Graphs

Cluster analysis is a fundamental task in data analysis that aims at partitioning a set of objects into a disjoint collection of objects with similar characteristics. In this talk, we will use the concept of conductance to measure the quality of cluster structure and will focus on a question of approximately recognizing cluster structure of a graph in sublinear time in the framework of property testing in the bounded degree model. We show how to test in O*(√n) time whether a graph with n nodes can be partitioned into no more than k parts (clusters) such that the outer-conductance of each cluster is at most \phi_i and the inner-conductance of the induced subgraph on each cluster is at least \phi_o, for a large spectrum of parameters k, \phi_i, and \phi_o. By the lower bound of \Omega(√n) for testing graph expansion, which corresponds to the case when k=1 in our problem, our algorithm is asymptotically optimal up to polylogarithmic factors.
This is joint work with Pan Peng and Christian Sohler.

Mark Davenport, Georgia Institute of Technology
Adaptive sensing of sparse signals in noise

In this talk I will focus on the problem of recovering a sparse image from a small number of noisy measurements. To begin, I will consider the case where the measurements are acquired in a nonadaptive fashion as in the standard compressive sensing framework. I will describe lower bounds on the minimax mean-squared error of the recovered vector which very nearly matches the performance of L1-minimization techniques, and hence shows that in certain regimes these techniques are essentially optimal. I will then consider the case where the measurements are acquired sequentially in an adaptive manner. I will describe a lower bound that shows that, surprisingly, in certain worst-case regimes, adaptivity does not allow for substantial improvement over standard nonadaptive techniques in terms of the minimax MSE. Nonetheless, I will also show that there are important regimes where the benefits of adaptive sensing are clear and overwhelming, and can be achieved via relatively simple algorithms that also have the benefit of a drastic reduction in the computational complexity of the recovery process.

Dan Feldman, MIT
Sublinear Systems using Core-sets

A core-set for a given problem is a semantic compression of its input, in the sense that a solution for the problem with the (small) coreset as input yields a provable approximate solution to the problem with the original (Big) Data. Core-set can usually be computed via one pass over a streaming input, manageable amount of memory,and in parallel. For real time performance we use Hadoop, Clouds and GPUs.
In this talk I will give a summary of some core-set techniques and their implementations in real-time systems. I will also share some social and technical challenges that I had to deal with in the industry and the robotics lab of MIT while implementing such core-sets. 

Eldar Fischer, Technion - Israel Institute of Technology
Partial testing, universal testing and decomposability Preprint

Some properties do not admit tests with a small number of queries - but can they be partitioned into a small number of properties that do admit such tests? With some properties this is indeed the case. For example, being the concatenation of two palindromes has no test with less than the square root of n queries, but it can be partitioned into n properties with constant query complexity tests.
Here we show a property that cannot be thus partitioned. Even more so, our property does not have any large sub-property that admits a small query complexity test. In fact, we show that all large dual distance codes are such properties. The proof uses methods not used before for defeating the most general tests; the entropy method is used to deal with possibly 2-sided tests, and a concept of a reader (basically a decision tree where we are interested in the "stream of bits read" rather than its final output) is used to analyze and defeat adaptive tests.
We also provide a general result (but with weak parameters) that gives an intuition why the partitionable property above still has a sublinear query complexity test: Any property partitionable into not too many sub-properties admitting proximity oblivious tests would admit a sublinear complexity test. The proof is by showing that, for a cost, a "one size fits all" test can be used for all properties with a proximity oblivious test.
Joint work with Yonatan Goldhirsh and Oded Lachish.

Elena Grigorescu, Purdue University
Tight lower bounds for testing linear isomorphism Preprint

We study lower bounds for testing membership in families of linear/affine-invariant Boolean functions over the hypercube. Motivated by the recent resurgence of attention to the permutation isomorphism problem, we first focus on families that are linearly/affinely isomorphic to some fixed function. Our main result is a tight adaptive, two-sided Ω(n^2) lower bound for testing linear isomorphism to the inner-product function. This is the first lower bound for testing linear isomorphism to a specific function that matches the trivial upper bound. Our proof exploits the elegant connection between testing and communication complexity discovered by Blais et al. (Computational Complexity, 2012.) Our second result shows an Ω(2^n/4) query lower bound for any adaptive, two-sided tester for membership in the Maiorana-McFarland class of bent functions. This class of Boolean functions is also affine-invariant and its rich structure and pseudorandom properties have been well-studied in mathematics, coding theory and cryptography. Joint work with Karl Wimmer and Ning Xie.

Venkatesan Guruswami, Carnegie Mellon University
Reed-Muller testing: implications for small set expansion & hypergraph coloring

The problem of testing proximity to low-degree multivariate polynomials was one of the original problems considered in (and arguably motivated) the property testing model. Recently, an algorithm for testing binary Reed-Muller codes with optimal trade-off between soundness and distance to the closest low-degree multilinear polynomial was given (Bhattacharyya, Kopparty, Schoenebeck, Sudan, Zuckerman, FOCS’10). Via derandomization of the hypercube by restricting to Reed-Muller codewords, this result has found some unexpected applications in approximability. For example, the subgraph of the noisy hypercube induced by Reed-Muller codewords is a small set expander with the currently largest known count of eigenvalues (of the associated random walk matrix) close to 1 (Barak, Gopalan, Hastad, Raghavendra, Meka, Steurer, FOCS’12). Replacing the long code (hypercube) gadget in PCP reductions in the classical ``Label Cover + Fourier Analysis'' framework by the low-degree version, together with strong soundness guarantees on structured tests for Reed-Muller codes has to led to more size-efficient PCPs and improved hardness results for hypergraph coloring (Dinur, G., FOCS’13; G., Harsha, Håstad, Srinivasan, Varma, STOC’14).
In this talk, I’ll attempt to provide a glimpse of the relevant Reed-Muller testing results and some of these connections.

Piotr Indyk, MIT
Approximation-Tolerant Model-Based Compressive Sensing Preprint

Valerie King
Impromptu Updating of a Dynamic Network Potentially relevant slides

We consider the CONGEST model of communication networks, where each node knows the names of its neighbors. We give some new simple techniques to reduce the number of bits communicated between nodes for repairing and constructing a minimum spanning tree or unweighted spanning forest which are subgraphs of the communication network. In particular we show that o(m) bits of communication suffice to construct a minimum spanning tree.

Edo Liberty, Yahoo Labs
Simple and Deterministic Matrix Sketching Preprint


Andrew McGregor, University of Massachusetts, Amherst
Graph Algorithms in the Sliding Window Model Potentially relevant preprint

We present the first algorithms for processing graphs in the sliding-window model. The sliding window model, introduced by Datar et al. (SICOMP 2002), has become a popular model for processing infinite data streams in small space when older data items (i.e., those that predate a sliding window containing the most recent data items) are considered “stale” and should implicitly be ignored. While processing massive graph streams is an active area of research, it was hitherto unknown whether it was possible to analyze graphs in the sliding- window model. In this talk we focus on algorithms for connectivity and finding large matchings.
Joint work with Michael Crouch and Dan Stubbs.

Andrea Montanari
Principal Component Analysis with Structured Factors

Many modern data sets are naturally presented as data matrices. Examples include recommendation systems (with rows corresponding to products, and columns to customers), hyper-spectral imaging (with rows corresponding to pixels, and columns to frequencies), gene expression data (with rows corresponding to patients, and columns to genes). Principal component analysis aims at reducing the dimensionality of such datasets by projecting samples in a few directions of maximum variability.
The principal vectors are often interpretable and convey important information about the data. This in turn imposes constraints on these vectors: for instance, in some cases the principal directions are known to be non-negative or sparse. Substantial work has been devoted to exploiting these constraints to improve statistical accuracy. Despite this, it is still poorly understood whether and when this is a good idea. I will discuss three examples that illustrate the mathematical richness of this problem, and its far reaching practical implications. [Based on joint work with Yash Deshpande and Emile Richard.]

Jelani Nelson, Harvard University
Dimensionality reduction via sparse matrices Potentially relevant slides

Dimensionality reduction techniques are used to obtain algorithmic speedup and storage savings in high-dimensional computational geometry, numerical linear algebra, compressed sensing, manifold learning, and clustering and several other machine learning problems. A common method for doing dimensionality reduction is to apply a random linear map to the input data (so-called "Johnson-Lindenstrauss transforms"). In this talk we discuss ways of designing such linear maps which are extremely sparse but still provide provable guarantees, thus speeding up the time to do the dimensionality reduction.
Based on joint works with Jean Bourgain, Daniel Kane, and Huy Lê Nguyễn.

Eric Price, IBM Almaden
Sharp bounds for learning a mixture of two Gaussians Preprint

We consider the problem of identifying the parameters of an unknown mixture of two arbitrary $d$-dimensional Gaussians from a sequence of random samples. Our main result is a computationally efficient moment-based estimator with an optimal convergence rate thus resolving a problem introduced by Pearson (1894). Denoting by $\sigma^2$ the variance of the unknown mixture, we prove that $\Theta(\sigma^{12})$ samples are necessary and sufficient to estimate each parameter up to constant additive error when $d=1.$ Our upper bound extends to arbitrary dimension $d higher than 1$ up to a (necessary) logarithmic loss in $d$ using a novel---yet simple---dimensionality reduction technique. Strikingly, our estimator turns out to be very similar to the one Pearson proposed in 1894 which reduces the one-dimensional problem to solving and analyzing a tractable system of polynomial equations. Our result greatly improves on the exponent in the sample size of the best previous estimator due to Kalai, Moitra and Valiant (2010).

Ronitt Rubinfeld, MIT and Tel Aviv University
Testing distributions in a kinder, gentler world

Many problems in testing distributions over large domains require a number of samples that is daunting. We survey recent and not-so-recent results on testing distributions given access to more powerful oracle-queries about the distributions in question.
Joint work with Clement Canonne.

Shubhangi Saraf
Breaking the quadratic barrier for 3-LCCs over the Reals Preprint

The classical Sylvester-Gallai Theorem states the following: given a finite set of points in the Euclidean plane, if the line through every pair of points passes through a third point, then all the points must be collinear. At a high level the result shows that many ``local" linear dependencies implies a ``global" bound on the dimension of the entire set of points. In recent years variants of the theorem have also found applications to several structural results in theoretical computer science.
In this talk I will describe some high dimensional and quantitative extensions to the Sylvester-Gallai theorem and show strong connections to the problem of locally correctable codes. In particular I will describe a recent result showing that 3-query linear locally correctable codes over the Reals of dimension d require block length n > d^{2+ε} for some positive ε > 0. This improves the known quadratic lower bounds which hold for any linear code. Our proof uses a powerful result by Barthe from convex geometry to derive a structure theorem on the decoding query triples. This structure supports stronger random restriction arguments, and hence stronger lower bounds than previously available.
Based on joint work with Zeev Dvir and Avi Wigderson

Gregory Valiant, 
Finding correlations in subquadratic time: new thoughts and applications Preprint

Paul Valiant, 
Instance Optimal Identity Testing Potentially relevant preprint
We consider the problem of verifying the identity of a distribution: Given the description of a distribution p over discrete support, how many samples (independent draws) must one obtain from an unknown distribution q, to distinguish, with high probability, the case that p=q from the case that the total variation distance (L1 distance) between p and q is at least epsilon? We resolve this question, up to constant factors, on an instance-by-instance basis: there exist universal constants c, c', and a function f(p,epsilon) on distributions and error parameters, such that our tester distinguishes p=q from |p-q|>epsilon using f(p,epsilon) samples with success probability >2/3, but no tester can distinguish p=q from |p-q|> c*epsilon when given c'*f(p,epsilon) samples. The function f has an unusual form, depending on the 2/3 norm of the distribution p, after the removing the largest single domain element, and the smallest domain elements cumulatively comprising epsilon mass; because this result is tight to constant factors, the unusual nature of f is in some sense a law of nature, and not a side-effect of our analysis. This result significantly generalizes and tightens previous results, and also introduces a new analysis technique that may be of independent interest.
The analysis of our estimator by comparing its variance to the square of its mean produces (as it usually does!) several hairy inequalities. To facilitate this analysis, we give a complete characterization of a general class of inequalities--generalizing Cauchy-Schwarz, Holder's inequality, and the monotonicity of Lp norms. Essentially, any inequality between several sequences of arbitrary positive numbers which is of the form where it "might" be proven as a product of Holder and Lp monotonicity inequalities, is true if and only if it is the product of such inequalities. We further give a polynomial time algorithm for deriving or refuting such inequalities, that in practice may often be carried out by hand, despite the fact that the shortest refutation of such inequalities may be doubly-exponential in length when written out explicitly. We hope that this tool will remove a hurdle from future results. [Joint work with Gregory Valiant]

Suresh Venkatasubramanian, University of Utah
A directed isoperimetric inequality with application to Bregman near neighbor lower bounds Preprint

We present lower bounds for approximate near neighbor search for Bregman divergences. We show that under the cell probe model, any non-adaptive data structure (like locality-sensitive hashing) for c-approximate near-neighbor search that admits r probes must use space Ω(n^{1 + u/cr}) (where u is a structure constant). In contrast, for LSH under ℓ1 the best bound is Ω(n^{1 + 1/r}).
Our new tool is a directed variant of the standard boolean noise operator. We show that a generalization of the Bonami-Beckner hypercontractivity inequality exists ``in expectation'' or upon restriction to certain subsets of the Hamming cube, and that this is sufficient to prove the desired isoperimetric inequality that we use in our data structure lower bound.
We also present a structural result reducing the Hamming cube to a Bregman cube. This structure allows us to obtain lower bounds for problems under Bregman divergences from their ℓ1 analog. In particular, we get a (weaker) lower bound for approximate near neighbor search of the form Ω(n^{1 + 1/cr}) for an r-query non-adaptive data structure, and new cell probe lower bounds for a number of other near neighbor questions in Bregman space.

Rachel Ward 
Linear dimension reduction in the l1 norm: When is it possible? How is it possible? Relevant preprint


Yuichi Yoshida
A Characterization of Locally Testable Affine-Invariant Properties via Decomposition Theorems Preprint

Let P be a property of function F_p^n → {0, 1} for a fixed prime p. An algorithm is called a tester for P if, given a query access to the input function f, with high probability, it accepts when f satisfies P and rejects when f is “far” from satisfying P. In this paper, we give a characterization of affine-invariant properties that are (two-sided error) testable with a constant number of queries. The characterization is stated in terms of decomposition theorems, which roughly claim that any function can be decomposed into a structured part that is a function of a constant number of polynomials, and a pseudo-random part whose Gowers norm is small. We first give an algorithm that tests whether the structured part of the input function has a specific form. Then we show that an affine-invariant property is testable with a constant number of queries if and only if it can be reduced to the problem of testing whether the structured part of the input function is close to one of a constant number of candidates.

Qin Zhang, Indiana University Bloomington
New Directions in Distributed Monitoring

In this talk we will discuss the distributed monitoring model, where we have k sites, each receiving a stream of elements over time. There is a designated coordinator who would like to track, that is, maintain continuously at all times, some function f of all the elements received from the k sites. There is a two-way communication channel between each site and the coordinator. The primary goal is to track f with minimum communication. We also want to minimize the space and processing time per item used at all sites. This model is motivated by applications in network monitoring, content delivery services, sensor networks, distributed databases, etc.
In this talk we will first briefly survey the existing results of this area, and then illustrate a few simple upper and lower bound examples. Next, we discuss several new research directions for distributed monitoring, including the search for generic monitoring algorithms, extending the study to problems of richer structures, and exploring/comparing variants of this model.

N00224162.jpg was taken on May 19, 2014 and received on Earth May 19, 2014. The camera was pointing toward TITAN at approximately 444,108 miles (714,722 kilometers) away, and the image was taken using the CL1 and UV3 filters. 

Image Credit: NASA/JPL/Space Science Institute

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Fast and Robust Archetypal Analysis for Representation Learning - implementation -

Here is the second entry on Archetypal Analysis today with a matlab implementation within SPAMS (soon to be added to the Advanced Matrix Factorization Jungle Page):

From the Introduction section:

Our main objective is to rehabilitate a pioneer unsupervised learning technique called archetypal analysis [5], which is easy to interpret while providing good results in prediction tasks. It was proposed as an alternative to principal component analysis (PCA) for discovering latent factors from high-dimensional data. Unlike principal components, each factor learned by archetypal analysis, called archetype, is forced to be a convex combination of a few data points. Such associations between archetypes and data points are useful for interpretation. For example, clustering techniques provide such associations between data and centroids. It is indeed common in genomics to cluster gene expression data from several individuals, and to interpret each centroid by looking for some common physiological
traits among individuals of the same cluster [7]. Interestingly, archetypal analysis is related to popular approaches such as sparse coding [18] and non-negative matrix factorization (NMF) [19], even though all these formulations were independently invented around the same time. Archetypal analysis indeed produces sparse representations of the data points, by approximating them with convex combinations of archetypes; it also provides a non-negative factorization when the data matrix is non-negative.
A natural question is why archetypal analysis did not gain a lot of success, unlike NMF or sparse coding. We believe that the lack of efficient available software has limited the deployment of archetypal analysis to promising applications; our goal is to address this issue...
The forimulation section has a very nice writeup on how the linear algebra of this matrix factorization differs/parallels that of NMF and Sparse Coding. Here is the paper: Fast and Robust Archetypal Analysis for Representation Learning by Yuansi Chen, Julien Mairal, Zaid Harchaoui
We revisit a pioneer unsupervised learning technique called archetypal analysis, which is related to successful data analysis methods such as sparse coding and non-negative matrix factorization. Since it was proposed, archetypal analysis did not gain a lot of popularity even though it produces more interpretable models than other alternatives. Because no efficient implementation has ever been made publicly available, its application to important scientific problems may have been severely limited. Our goal is to bring back into favour archetypal analysis. We propose a fast optimization scheme using an active-set strategy, and provide an efficient open-source implementation interfaced with Matlab, R, and Python. Then, we demonstrate the usefulness of archetypal analysis for computer vision tasks, such as codebook learning, signal classification, and large image collection visualization.

The implementation is part ofthe SPAms software at:

In the Advanced Matrix Factorization Jungle Page, I listed the Archetypal decomposition with the following definition:

  • Archetypal Analysis: A = DX with unknown D and X, solve for D = AB with  D and B positive

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Probabilistic Archetypal Analysis - implementation -

If one recalls the recent theoretical headway in NMF, it relies on the need to have pure components, an assumption called near-separability ( see the recent The Why and How of Nonnegative Matrix Factorization ). More recently, in Random Projections for Non-negative Matrix Factorization, there was also some resorting to some extreme point in point clouds:
However, the geometric interpretation remains valid and the approach gives non-negative factors U and V such that the columns of U are the extreme rays of a polyhedral cone that contains most of the columns of X....

So this is facsinating that there seem to be another line of inquiry, Archetypal Analysis, that started back in 1994 with Cutler and Breiman that does a matrix factorization similar to the NMF but has additional constraints in that 
each individual member of a set of data vectors as a mixture (a constrained linear combination) of the pure types or archetypes of the data set.
Archetypal Analysis has some relation to k-means as well. This advanced matrix decomposition is soon to be added to the Advanced Matrix Factorization Jungle page as this entry and the next feature new implementations of that algorithm. Here is the first: Probabilistic Archetypal Analysis by Sohan Seth, Manuel J. A. Eugster
Archetypal analysis represents a set of observations as convex combinations of pure patterns, or archetypes. The original geometric formulation of finding archetypes by approximating the convex hull of the observations assumes them to be real valued. This, unfortunately, is not compatible with many practical situations. In this paper we revisit archetypal analysis from the basic principles, and propose a probabilistic framework that accommodates other observation types such as integers, binary, and probability vectors. We corroborate the proposed methodology with convincing real-world applications on finding archetypal winter tourists based on binary survey data, archetypal disaster-affected countries based on disaster count data, and document archetypes based on term-frequency data. We also present an appropriate visualization tool to summarize archetypal analysis solution better.
The implementation is at:

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, May 29, 2014

On the Convergence of Approximate Message Passing with Arbitrary Matrices

This discussion on LinkedIn reminded me how late I am on certain things to feature here on Nuit Blanche. For short, we know that AMP algorithms are pretty good in terms of speed and in terms of pushing the sharp phase transition into new territories.

What we don't really understand is why it works well in some cases and why it absolutely fails in others. For instance, I am aware of one instance of Uncertainty Quantification where it really is bad compared to a "simpler" convex L1 minimization approach.

Anyway, here is a new stab at the problem that might help design the right parameters within AMP to fit to specific measurement matrices.

Approximate message passing (AMP) methods and their variants have attracted considerable recent attention for the problem of estimating a random vector x observed through a linear transform A. In the case of large i.i.d. A, the methods exhibit fast convergence with precise analytic characterizations on the algorithm behavior. However, the convergence of AMP under general transforms is not fully understood. In this paper, we provide sufficient conditions for the convergence of a damped version of the generalized AMP (GAMP) algorithm in the case of Gaussian distributions. It is shown that, with sufficient damping the algorithm can be guaranteed to converge, but the amount of damping grows with peak-to-average ratio of the squared singular values of A. This condition explains the good performance of AMP methods on i.i.d. matrices, but also their difficulties with other classes of transforms. A related sufficient condition is then derived for the local stability of the damped GAMP method under more general (possibly non-Gaussian) distributions, assuming certain strict convexity conditions.
I note from the conclusion:

 The required amount of damping is related to the peak-to-average ratio of the squared singular values of the transform matrix. However, much remains unanswered: Most importantly, we have yet to derive a condition for global convergence even in the case of strictly convex functions. Secondly, our analysis assumes the use of fixed stepsizes. Third, short of computing the peak-to-average singular-value ratio, we proposed no method to compute the damping constants. Hence, an adaptive method may be useful in practice.

Image Credit: NASA/JPL/Space Science Institute, N00224160.jpg was taken on May 22, 2014 and received on Earth May 24, 2014. The camera was pointing toward RHEA at approximately 1,201,511 miles (1,933,644 kilometers) away, and the image was taken using the CL1 and CL2 filters. 

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Wednesday, May 28, 2014

CGIHT, ASD and ScaledASD: Compressed Sensing and Matrix Completion - implementation -

Jared then continues his email with:

"...Associated, Jeff BlanchardKe Wei, and I have released a couple  of papers with new algorithms for compressed sensing and matrix  completion. The first paper is a nonlinear restarted conjugate gradient  approach with hard thresholding, named CGIHT:
The second paper is an algorithm for matrix completion,

The common thread linking these two papers is keeping the per iteration computational complexity very low so as to allow a more efficient search of subspaces, while also implicitly reverting to a higher order method once the correct subspace is identified. We find this approach highly favourable in computational experiments.

It would be greatly appreciated if you could mention these on Nuit Blanche.

All the best,
Jared TannerProfessor of the Mathematics of Information
Mathematics Institute
University of Oxford
Sure Jared !

Here are the two papers; let us note the row sparse matrix case is also called the MMV problem:

We introduce the Conjugate Gradient Iterative Hard Thresholding (CGIHT) family of algorithms for the efficient solution of constrained underdetermined linear systems of equations arising in compressed sensing, row sparse approximation, and matrix completion. CGIHT is designed to balance the low per iteration complexity of simple hard thresholding algorithms with the fast asymptotic convergence rate of employing the conjugate gradient method. We establish provable recovery guarantees and stability to noise for variants of CGIHT with sufficient conditions in terms of the restricted isometry constants of the sensing operators. Extensive empirical performance comparisons establish significant computational advantages for CGIHT both in terms of the size of problems which can be accurately approximated and in terms of overall computation time.

The phase transition shown above is similar to the one found by Jeff earlier ( Sharp Phase Transition for Joint Sparse Recovery (MMV) )

Matrix completion involves recovering a matrix from a subset of its entries by utilizing interdependency between the entries, typically through low rank structure. Despite the combinatorial nature of matrix completion, there are many computationally efficient algorithms which are effective for a broad class of matrices. In this paper, we introduce an alternating steepest descent algorithm (ASD) and a scaled variant, ScaledASD, for the fixed-rank matrix completion problem. Empirical evaluation of ASD and ScaledASD on both image in painting and random problems show they are competitive with other state-of-the-art matrix completion algorithms in terms of recoverable rank and overall computational time. In particular, their low per iteration computational complexity makes ASD and ScaledASD efficient for large size problems, especially when computing the solutions to moderate accuracy. A preliminary convergence analysis is also presented.

Ke Wei then let me know that 

Dear Igor,
Following Jared's email, the matlab codes for the matrix completion algorithms in the two papers
can be downloaded at
Many thanks,

Thanks Ke and Jared

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

GAGA: GPU Accelerate Greedy Algorithms for Compressed Sensing - implementation, new version -

Jared Tanner just sent me the following:

Dear Igor,
....I'm writing to alert you to a couple of new papers and a revised software package. You might recall that about a year ago Jeff Blanchard and I released the code "GAGA: GPU Accelerated Greedy Algorithms for compressed sensing." We have just posted a new release with more algorithms included  and some other improvements such as faster matrix vector product  kernels. The code is available free for non-commercial use at:

Thanks Jared, to be continued in the next entry.

Credit Photo: Dan Piponi

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

PostDoc: Very large data management in Geosciences

Not specifically related to CS but it might (especially if one look at randomized algorithm as a way to preprocess a majority of these datasets), Laurent just let me know about this opportunity: 

Very large data management in Geosciences (gestion des très gros volumes de données en géosciences)
Abstract: The main purpose of the post-doctoral work is to propose new data compression techniques for volumetric meshes able to manage seismic data values attached to geometry elements (nodes or cells) with adaptive decompression for post-processing functionalities (visualization). Compression algorithms adapted to "big data" will enable our current software scalability, for instance, geoscience fluid-flow simulation or transport combustion simulation on very large meshes. Obtained results are intended to contribute to IFPEN scientific lock about very large data management with a target of being able to process billion of cells or data samples. Results will also be used to propose new software solutions for the storage, the transfer and the processing (exploration, visualization) of these large data sets.

Résumé : L'objectif de ce post-doctorat est de proposer de nouvelles méthodes de compression de données et de maillages volumiques capables de gérer des propriétés attachées à la géométrique (connexité de mailles, groupes spatiaux de traces sismiques), éventuellement évolutives, tout en permettant une décompression progressive et adaptée à la visualisation et au traitement. Les algorithmes de compression pour les données volumiques permettraient de les exploiter dans les outils logiciels qui manipulent des ensembles volumineux (simulation d'écoulement poreux en géosciences ou simulation de combustion en transport). Les résultats obtenus auront vocation à contribuer au verrou technologique concernant les très gros volumes de données avec une cible fixée sur le milliard de cellules ou d'échantillons. Les résultats seront notamment exploités pour assurer le stockage, le transfert mais aussi la manipulation (exploration, visualisation) de ces très gros volumes.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.