Tuesday, April 21, 2015

Streaming: Memory Limited Matrix Completion with Noise, Verifiable Stream Computation, Count-Min-Log sketch

Streaming is bound to take a centerstage role if we are talking about Big Data...or Data as we call it in Texas. Today, we have an algorithm for matrix completion, a paper on verifying the properties of certain streams and some approximation of count-min sketch. Without further ado:

Streaming, Memory Limited Matrix Completion with Noise by Se-Young Yun, Marc Lelarge, Alexandre Proutiere

In this paper, we consider the streaming memory-limited matrix completion problem when the observed entries are noisy versions of a small random fraction of the original entries. We are interested in scenarios where the matrix size is very large so the matrix is very hard to store and manipulate. Here, columns of the observed matrix are presented sequentially and the goal is to complete the missing entries after one pass on the data with limited memory space and limited computational complexity. We propose a streaming algorithm which produces an estimate of the original matrix with a vanishing mean square error, uses memory space scaling linearly with the ambient dimension of the matrix, i.e. the memory required to store the output alone, and spends computations as much as the number of non-zero entries of the input matrix.  
Verifiable Stream Computation and Arthur–Merlin Communication  by Amit Chakrabarti, Graham Cormode, Andrew McGregor, Justin Thaler, Suresh Venkatasubramanian
In the setting of streaming interactive proofs (SIPs), a client (verifier) needs to compute a given function on a massive stream of data, arriving online, but is unable to store even a small fraction of the data. It outsources the processing to a third party service (prover), but is unwilling to blindly trust answers returned by this service. Thus, the service cannot simply supply the desired answer; it must convince the verifier of its correctness via a short interaction after the stream has been seen.

In this work we study "barely interactive" SIPs. Specifically, we show that two or three rounds of interaction suffice to solve several query problems --- including Index, Median, Nearest Neighbor Search, Pattern Matching, and Range Counting --- with polylogarithmic space and communication costs. Such efficiency with O(1) rounds of interaction was thought to be impossible based on previous work.

On the other hand, we initiate a formal study of the limitations of constant-round SIPs by introducing a new hierarchy of communication models called Online Interactive Proofs (OIPs). The online nature of these models is analogous to the streaming restriction placed upon the verifier in an SIP. We give upper and lower bounds that (1) characterize, up to quadratic blowups, every finite level of the OIP hierarchy in terms of other well-known communication complexity classes, (2) separate the first four levels of the hierarchy, and (3) reveal that the hierarchy collapses to the fourth level. Our study of OIPs reveals marked contrasts and some parallels with the classic Turing Machine theory of interactive proofs, establishes limits on the power of existing techniques for developing constant-round SIPs, and provides a new characterization of (non-online) Arthur–Merlin communication in terms of an online model.

Comment: Some of the results in this paper appeared in an earlier technical report (http://eccc.hpi-web.de/report/2013/180/). That report has been subsumed by this manuscript and an upcoming manuscript by Thaler titled "Semi-Streaming Algorithms for Annotated Graphs Streams".  

Count-Min-Log sketch: Approximately counting with approximate counters by Guillaume Pitel, Geoffroy Fouquier

Count-Min Sketch is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested by the low-frequency items, such as in text-mining related tasks. Several variants of CMS have been proposed to compensate for the high relative error for low-frequency events, but the proposed solutions tend to correct the errors instead of preventing them. In this paper, we propose the Count-Min-Log sketch, which uses logarithm-based, approximate counters instead of linear counters to improve the average relative error of CMS at constant memory footprint. 
Image Credit: NASA/JPL-Caltech/Space Science Institute
N00238257.jpg was taken on March 28, 2015 and received on Earth March 31, 2015. The camera was pointing toward IAPETUS, and the image was taken using the BL1 and GRN filters. This image has not been validated or calibrated.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments: