Thursday, March 10, 2016

New Algorithms for Heavy Hitters in Data Streams

 


New Algorithms for Heavy Hitters in Data Streams by David P. Woodruff

An old and fundamental problem in databases and data streams is that of finding the heavy hitters, also known as the top-$k$, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem, which quantify what it means for an item to be frequent, including what are known as the $\ell_1$-heavy hitters and $\ell_2$-heavy hitters. There are a number of algorithmic solutions for these problems, starting with the work of Misra and Gries, as well as the CountMin and CountSketch data structures, among others.
In this survey paper, accompanying an ICDT invited talk, we cover several recent results developed in this area, which improve upon the classical solutions to these problems. In particular, with coauthors we develop new algorithms for finding $\ell_1$-heavy hitters and $\ell_2$-heavy hitters, with significantly less memory required than what was known, and which are optimal in a number of parameter regimes.
 
 
Image Credit: NASA/JPL-Caltech/Space Science Institute 
  Full-Res: N00256829.jpg
N00256829.jpg was taken on March 09, 2016 and received on Earth March 09, 2016. The camera was pointing toward SATURN, and the image was taken using the CL1 and CL2 filters. This image has not been validated or calibrated.
 

 
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

No comments:

Printfriendly