Nuit Blanche: Faster Kernel Ridge Regression Using Sketching and Preconditioning / Sharper Bounds for Regression and Low-Rank Approximation with Regularization

Monday, November 28, 2016

Faster Kernel Ridge Regression Using Sketching and Preconditioning / Sharper Bounds for Regression and Low-Rank Approximation with Regularization

Faster Kernel Ridge Regression Using Sketching and Preconditioning by Haim Avron, Kenneth L. Clarkson, David P. Woodruff

Random feature maps, such as random Fourier features, have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods such as kernel ridge regression. However, random feature maps only provide crude approximations to the kernel function, so delivering state-of-the-art results requires the number of random features to be very large. Nevertheless, in some cases, even when the number of random features is driven to be as large as the training size, full recovery of the performance of the exact kernel method is not attained. In order to address this issue, we propose to use random feature maps to form preconditioners to be used in solving kernel ridge regression to high accuracy. We provide theoretical conditions on when this yields an effective preconditioner, and empirically evaluate our method and show it is highly effective for datasets of up to one million training examples.

Sharper Bounds for Regression and Low-Rank Approximation with Regularization by Haim Avron, Kenneth L. Clarkson, David P. Woodruff

The technique of matrix sketching, such as the use of random projections, has been shown in recent years to be a powerful tool for accelerating many important statistical learning techniques. Research has so far focused largely on using sketching for the "vanilla" un-regularized versions of these techniques.
Here we study sketching methods for regularized variants of linear regression, low rank approximations, and canonical correlation analysis. We study regularization both in a fairly broad setting, and in the specific context of the popular and widely used technique of ridge regularization; for the latter, as applied to each of these problems, we show algorithmic resource bounds in which the {\em statistical dimension} appears in places where in previous bounds the rank would appear. The statistical dimension is always smaller than the rank, and decreases as the amount of regularization increases. In particular, for the ridge low-rank approximation problem minY,X∥YX−A∥2F+λ∥Y∥2F+λ∥X∥2F, where Y∈Rn×k and X∈Rk×d, we give an approximation algorithm needing
$O (n n z (A)) + O ~ ((n + d) ε - 1 k min {k, ε - 1 s d λ (Y *)}) + O ~ (ε - 8 s d λ (Y *) 3)$ time, where sλ(Y∗)≤k is the statistical dimension of Y∗, Y∗ is an optimal Y, ε is an error parameter, and nnz(A) is the number of nonzero entries of A.
We also study regularization in a much more general setting. For example, we obtain sketching-based algorithms for the low-rank approximation problem minX,Y∥YX−A∥2F+f(Y,X) where f(⋅,⋅) is a regularizing function satisfying some very general conditions (chiefly, invariance under orthogonal transformations).

Image Credit: NASA/JPL-Caltech/Space Science Institute
W00102494.jpg was taken on 2016-11-25 17:59 (UTC) and received on Earth 2016-11-26 04:45 (UTC). The camera was pointing toward Saturn, and the image was taken using the MT3 and CL2 filters.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !