Tuesday, June 11, 2013

Simple and Deterministic Matrix Sketches and Near-optimal Distributions for Data Matrix Sampling

Let us continue this week with some exciting streaming/sketching algorithms:

Simple and Deterministic Matrix Sketches by Edo Liberty (See experimental results in json format)

A sketch of a matrix A is another matrix B which is significantly smaller than A but still approximates it well. Finding such sketches effi ciently is an important building block in modern algorithms for approximating, for example, the PCA of massive matrices. This task is made more challenging in the streaming model, where each row of the input matrix can only be processed once and storage is severely limited. In this paper we adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives n rows of a large matrix A 2 R n m one after the other in a streaming fashion. It maintains a sketch B 2 R` m containing only ` n rows but still guarantees that ATA B TB. More accurately,
8x;kxk = 1 0 kAxk2  kBxk2 2kAk2f=`
Or
BTB A TA and kA TABTBk 2kAk2f=` :This gives a streaming algorithm whose error decays proportional to 1=` using O(m`) space. For comparison, random-projection, hashing or sampling based algorithms produce convergence bounds proportional to 1=p`. Sketch updates per row in A require amortized O(m`) operations and the algorithm is perfectly parallelizable. Our experiments corroborate the algorithm's scalability and improved convergence rate. The presented algorithm also stands out in that it is deterministic, simple to implement and elementary to prove.

We give near-optimal distributions for the sparsiﬁcation of large m x n matrices,where m much less than n, for example representing n observations over m attributes. Our algorithms can be applied when the non-zero entries are only available as a stream, i.e., in arbitrary order, and result in matrices which are not only sparse, but whose values are also highly compressible. In particular, algebraic operations with the resulting matrices can be implemented as (ultra-efﬁcient) operations over indices.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.