- One “bit” – Frugal-1U
- Finding Quantiles With Two “bits”- Frugal-2U
and he also has a script you can try in the browser. Check also the comments.
To get a bird's eye view on some of the techniques, you might want to read this excellent entry on Probabilistic Data Structures for Web Analytics and Data Mining featuring several aspects of counting:
- Cardinality Estimation: Linear Counting
- Cardinality Estimation: Loglog Counting
- Frequency Estimation: Count-Min Sketch
- Frequency Estimation: Count-Mean-Min Sketch
- Heavy Hitters: Count-Min Sketch
- Heavy Hitters: Stream-Summary
- Range Query: Array of Count-Min Sketches
- Membership Query: Bloom Filter
Sparse Recovery with Very Sparse Compressed Counting by Ping Li, Cun-Hui Zhang, Tong Zhang
Compressed sensing (sparse signal recovery) often encounters nonnegative data (e.g., images). Recently we developed the methodology of using (dense) Compressed Counting for recovering nonnegative K-sparse signals. In this paper, we adopt very sparse Compressed Counting for nonnegative signal recovery. Our design matrix is sampled from a maximally-skewed p-stable distribution (0 \gt p \lt1), and we sparsify the design matrix so that on average (1-g)-fraction of the entries become zero. The idea is related to very sparse stable random projections (Li et al 2006 and Li 2007), the prior work for estimating summary statistics of the data.In our theoretical analysis, we show that, when p goes to 0, it suffices to use M= K/(1-exp(-gK) log N measurements, so that all coordinates can be recovered in one scan of the coordinates. If g = 1 (i.e., dense design), then M = K log N. If g= 1/K or 2/K (i.e., very sparse design), then M = 1.58K log N or M = 1.16K log N. This means the design matrix can be indeed very sparse at only a minor inflation of the sample complexity.Interestingly, as p goes to 1, the required number of measurements is essentially M = 2.7K log N, provided g= 1/K. It turns out that this result is a general worst-case bound.
I note this footnote "2 This report does not include comparisons with the SMP algorithm [1, 5], as we can not run the code from
at the moment. We will provide the comparisons after we are able to execute the code. We thank the communications with the author of [1, 5]."
which is, to my knowledge, the first reference using sparse measurement matrices for images (positive unknowns). I sure look forward to see how SMP compares to this new approach. The SMP wiki was featured in 2008 here on Nuit Blanche. Sparse measurement matrices are stunning now in 2014, they were already stunning in 2008.
- Compressed Counting Meets Compressed Sensing
- Sketching data structures by Ilya Katsov
- Another use of Count-Min Sketch: Particle Sketches
- Videos: Succinct Data Representations and Applications
- Videos: Big Data Boot Camp Day 2,3 and 4, Simons Institute, Berkeley
- Videos from the Big Data Boot Camp Day 1 at the Simons Institute at Berkeley,
- SAHD: Dynamic L1 Reconstruction - Justin Romberg
- Sparse Recovery of Streaming Signals Using l_1-Homotopy - implementation -
- Streaming Data Mining Tutorial slides (and more)
- Eigenvalues of a matrix in the streaming model
- Piotr Indyk, Tutorial on compressive sensing/streaming algorithms
- A Tutorial on Compressed Sensing (or Compressive Sampling or Linear Sketching) / Internships on the Riviera
- Vanishingly Sparse Matrices and Expander Graphs, with application to compressed sensing
- Big Data Counting: How To Count A Billion Distinct Objects Using Only 1.5KB Of Memory
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.