Nuit Blanche: An ODE for solvers, a Group approach to regularizers, Sketching for Kernels and Linear Algebra, Johnson-Lindenstrauss is optimal, Coresets of Streaming Data

Friday, November 07, 2014

An ODE for solvers, a Group approach to regularizers, Sketching for Kernels and Linear Algebra, Johnson-Lindenstrauss is optimal, Coresets of Streaming Data

Today we have a mix of interesting papers, some of which will be featured at NIPS 2014

A differential equation for modeling Nesterov's accelerated gradient method: theory and insights by Weijie Su, Stephen Boyd, Emmanuel J. Candes

We derive a second-order ordinary differential equation (ODE), which is the limit of Nesterov’s accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov’s scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov’s scheme. As a byproduct, we obtain a family of schemes with similar convergence rates. The ODE interpretation also suggests restarting Nesterov’s scheme leading to an algorithm, which can be rigorously proven to converge at a linear rate whenever the objective is strongly convex.

Orbit Regularization by Renato Negrinho, Andre Filipe Torres Martins

We propose a general framework for regularization based on group majorization. In this framework, a group is defined to act on the parameter space and an orbit is fixed; to control complexity, the model parameters are confined to lie in the convex hull of this orbit (the orbitope). Common regularizers are recovered as particular cases, and a connection is revealed between the recent sorted 1 -norm and the hyperoctahedral group. We derive the properties a group must satisfy for being amenable to optimization with conditional and projected gradient algorithms. Finally, we suggest a continuation strategy for orbit exploration, presenting simulation results for the symmetric and hyperoctahedral groups.

Subspace Embeddings for the Polynomial Kernel by Haim Avron, Huy L. Nguyen, David P. Woodruff

Sketching is a powerful dimensionality reduction tool for accelerating statistical learning algorithms. However, its applicability has been limited to a certain extent since the crucial ingredient, the so-called oblivious subspace embedding, can only be applied to data spaces with an explicit representation as the column span or row span of a matrix, while in many settings learning is done in a high-dimensional space implicitly defined by the data matrix via a kernel transformation. We propose the first fast oblivious subspace embeddings that are able to embed a space induced by a non-linear kernel without explicitly mapping the data to the high-dimensional space. In particular, we propose an embedding for mappings induced by the polynomial kernel. Using the subspace embeddings, we obtain the fastest known algorithms for computing an implicit low rank approximation of the higher-dimension mapping of the data matrix, and for computing an approximate kernel PCA of the data, as well as doing approximate kernel principal component regression.

The Johnson-Lindenstrauss lemma is optimal for linear dimensionality reduction
Kasper Green Larsen, Jelani Nelson

Sketching as a Tool for Numerical Linear Algebra by David P. Woodruff

Sketching as a Tool for Numerical Linear Algebra highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by multiplying it by a (usually) random matrix with certain properties. Much of the expensive computation can then be performed on the smaller matrix, thereby accelerating the solution for the original problem. It is an ideal primer for researchers and students of theoretical computer science interested in how sketching techniques can be used to speed up numerical linear algebra applications.

If you do not have access to it, here is a slide presentation on the subject.

and finally:

Coresets for k-Segmentation of Streaming Data by Dan Feldman, Guy Rossman, Mikhail Volkov

Life-logging video streams, financial time series, and Twitter tweets are a few examples of high-dimensional signals over practically unbounded time. We consider the problem of computing optimal segmentation of such signals by a k-piecewise linear function, using only one pass over the data by maintaining a coreset for the signal. The coreset enables fast further analysis such as automatic summarization and analysis of such signals. A coreset (core-set) is a compact representation of the data seen so far, which approximates the data well for a specific task – in our case, segmentation of the stream. We show that, perhaps surprisingly, the segmentation problem admits coresets of cardinality only linear in the number of segments k and independent of both the dimension d of the signal, and its number n of points. More precisely, we construct a representation of size O(k="2)that provides a (1 +")-approximation for the sum of squared distances to any given k-piecewise linear function. Moreover,such coresets can be constructed in a parallel streaming approach. Our results relyon a novel reduction of statistical estimations to problems in computational geometry. We empirically evaluate our algorithms on very large synthetic and real data sets from GPS, video and financial domains, using 255 machines in Amazon cloud.

Join the CompressiveSensing subreddit or the Google+ Community and post there !