From the paper:
Lazaro-Gredilla et al. (2010) suggested an alternative approximation to the GP model. In their paper they suggest the decomposition of the GP’s stationary covariance function into its Fourier series. The infinite series is then approximated with a finite one. They optimise over the frequencies of the series to minimise some divergence from the full Gaussian process. This approach was named a “sparse spectrum” approximation. This approach is closely related to the one suggested by Rahimi & Recht (2007) in the randomised methods community (random projections). In Rahimi & Recht (2007)’s approach, the frequencies are randomised (sampled from some distribution rather than optimised) and the Fourier coefficients are computed analytically. Both approaches capture globally complex behaviour, but the direct optimisation of the different quantities often leads to some form of over-fitting (Wilson et al., 2014). Similar over-fitting problems that were observed with the sparse pseudo-input approximation were answered with variational inference (Titsias, 2009)and a connection to Zoubin's work he presented to us at the last Paris Machine Learning meetup:
The distribution over the frequencies is optimised to fit the data well. The prior is used to regulate the fit and avoid over-fitting to the data. This approximation can be used to learn covariance functions by fitting them to the data. This is similar to the ideas brought in (Duvenaud et al., 2013) where the structure of a covariance function is sought by looking at possible compositions of these. This can give additional insight into the data. In (Duvenaud et al., 2013) the structure of the covariance composition is used to explain the data. In the approximation presented here the spectrum of the covariance function can be used to explain the data
Without further ado, here is the paper: Variational Inference for Sparse Spectrum Approximation in Gaussian Process Regression by Yarin Gal, Richard Turner
Standard sparse pseudo-input approximations to the Gaussian process (GP) cannot handle complex functions well. Sparse spectrum alternatives attempt to answer this but are known to over-fit. We suggest the use of variational inference for the sparse spectrum approximation to avoid both issues. We model the covariance function with a finite Fourier series approximation and treat it as a random variable. The random covariance function has a posterior, on which a variational distribution is placed. The variational distribution transforms the random covariance function to fit the data. We study the properties of our approximate inference, compare it to alternative ones, and extend it to the distributed and stochastic domains. Our approximation captures complex functions better than standard approaches and avoids over-fitting.
The implementation is here: https://github.com/yaringal/VSSGP
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.