Nuit Blanche: Agnostic Estimation of Mean and Covariance

Thursday, April 28, 2016

Agnostic Estimation of Mean and Covariance

The reason we have a zoo of matrix factorizations stems in part with the need to deal with different adversarial noises. From the paper:

The Achilles heel of algorithms for generative models is the assumption that data is exactly from the model. This is crucial for known guarantees, and relaxations of it are few and specialized, e.g., in ICA, data could by noisy, but the noise itself is assumed to be Gaussian. Assumptions about rank and sparsity are made in a technique that is now called Robust PCA [CSPW11, CLMW11]. There have been attempts [Kwa08, MT+11] at achieving robustness by L1 minimization, but they don’t give any error bounds on the output produced. A natural, important and wide open problem is estimating the parameters of generative models in the presence of arbitrary, i.e., malicious noise, a setting usually referred to as agnostic learning. The simplest version of this problem is to estimate a single Gaussian in the presence of malicious noise. Alternatively, this can be posed as the problem of finding a best-fit Gaussian to data or agnostically learning a single Gaussian.

Agnostic Estimation of Mean and Covariance by Kevin A. Lai, Anup B. Rao, Santosh Vempala

We consider the problem of estimating the mean and covariance of a distribution from iid samples in $\mathbb{R}^n$, in the presence of an $\eta$ fraction of malicious noise; this is in contrast to much recent work where the noise itself is assumed to be from a distribution of known type. The agnostic problem includes many interesting special cases, e.g., learning the parameters of a single Gaussian (or finding the best-fit Gaussian) when $\eta$ fraction of data is adversarially corrupted, agnostically learning a mixture of Gaussians, agnostic ICA, etc. We present polynomial-time algorithms to estimate the mean and covariance with error guarantees in terms of information-theoretic lower bounds. As a corollary, we also obtain an agnostic algorithm for Singular Value Decomposition.

and previously the Recursive Fourier PCA algorithm

Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity by Santosh S. Vempala, Ying Xiao

We present a simple, general technique for reducing the sample complexity of matrix and tensor decomposition algorithms applied to distributions. We use the technique to give a polynomial-time algorithm for standard ICA with sample complexity nearly linear in the dimension, thereby improving substantially on previous bounds. The analysis is based on properties of random polynomials, namely the spacings of an ensemble of polynomials. Our technique also applies to other applications of tensor decompositions, including spherical Gaussian mixture models.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !