Wednesday, August 10, 2011

Counting and Mutiplying with Random Projections

Today the theme is using random projections to perform faster matrix multiplications and counting. Hopefully, we'll some of these algorithms implemented soon.

Motivated by the problems of computing sample covariance matrices, and of transforming a collection of vectors to a basis where they are sparse, we present a simple algorithm that computes an approximation $C$ of the product of two $n$-by-$n$ real matrices $A$ and $B$. Our algorithm works in time $\tildeO(n^2+ n b)$, and output entry $C_{ij}$ is an unbiased estimator of $(AB)_{ij}$ with variance at most $||AB||_F^2 / b$, where $||AB||_F$ denotes the Frobenius norm of $AB$, and $b$ is a parameter. The approach also leads to an algorithm for computing $AB$ exactly, whp., in time $\tilde O(N + nb)$ in the case where $A$ and $B$ have at most $N$ nonzero entries, and $AB$ has at most $b$ nonzero entries. The main technical insight is that polynomial multiplication can be used to efficiently compute (using FFT) a linear sketch of an outer product of two vectors, which enables efficient "compressed sensing" of $AB$. Also, we use error-correcting codes in a novel way to recover significant entries of $AB$ in $o(n^2)$ time.

Efficient estimation of the moments and Shannon entropy of data streams is an important task in modern machine learning and data mining. To estimate the Shannon entropy, it suffices to accurately estimate the ®-th moment with ¢ = j1 ¡ ®j ¼ 0. To guarantee that the error of estimated Shannon entropy is within a º-additive factor, the method of symmetric stable random projections requires O¡1º2¢2¢ samples, which is extremely expensive. The first paper (Li, SODA2009) in Compressed Counting (CC), based on skewed-stable random projections, supplies a substantial improvement by reducing the sample complexity to O¡1º2¢¢, which is still expensive. The followup work (Li, UAI2009) provides a practical algorithm, which is however di±cult to analyze theoretically. In this paper, we propose a new accurate algorithm for Compressed Counting, whose sample complexity is only O¡1º2¢for º-additive Shannon entropy estimation. The constant factor for this bound is merely about 6. In addition, we prove that our algorithm achieves an upper bound of the Fisher information and in fact it is close to 100% statistically optimal. An empirical study is conducted to verify the accuracy of our algorithm.

No comments: