Friday, November 15, 2013

Parallel and distributed sparse optimization - implementation -

Ever wondered what Big Data LASSO perfs you can get on an EC2 with an ADMM algorithm ? Zhimin Peng, Ming Yan and Wotao Yin have an answer that can fit in the category faster than a blink of an eye.

Parallel and Distributed Sparse Optimization by Zhimin Peng, Ming Yan and Wotao Yin

Abstract—This paper proposes parallel and distributed algorithms for solving very large-scale sparse optimization problems on computer clusters and clouds. Modern datasets usually have a large number of features or training samples, and they are usually stored in a distributed manner. Motivated by the need of solving sparse optimization problems with large datasets, we propose two approaches including (i) distributed implementations of proxlinear algorithms and (ii) GRock, a parallel greedy coordinateblock descent method. Different separability properties of the objective terms in the problem enable different data-distributed schemes along with their corresponding algorithm implementations. We also establish the convergence of GRock and explain why it often performs exceptionally well for sparse optimization. Numerical results on a computer cluster and Amazon EC2 demonstrate the efficiency and elasticity of our algorithms.

From the page where all the codes are:

Parallel and Distributed Sparse Optimization

Zhimin Peng, Ming Yan and Wotao Yin

Background

Modern datasets usually have a large number of features or training samples, and they are usually stored in a distributed manner. Motivated by the need of solving sparse optimization problems with large datasets, we propose two approaches including (i) distributed implementations of prox-linear algorithms and (ii) GRock, a parallel greedy coordinate descent method.

Links