The introduction to the following paper is a wonderful backgrounder:
Machine Learning has gained great empirical success from the massive data sets collected from various domains. Among them a major challenge is to utilize existing computational resources to build predictive and inferential models from such huge data sets, while maintaining the statistical power of big data. One remedy for the big data challenge is to build distributed computer systems and design distributed learning algorithms to make big data learning possible, however, distributed systems may not always available, and the cost of running distributed system can be much higher than one can afford, which makes distributed learning not suitable for all scenarios. An alternative remedy is to use the state-of-the-art randomized optimization algorithms to accelerate the training process, for example, researchers have proposed optimization algorithms for regularized empirical risk minimization problem, with provable fast convergence and low computational cost per iteration (see (Johnson and Zhang, 2013; Shalev-Shwartz and Zhang, 2013; Defazio et al., 2014) for examples), however, the speed of these optimization methods still heavily depends on the condition number of problem at hand, which can be undesirable for many real world problems. Sketching (Woodruff, 2014), which approximates the solution via constructing some sketched, usually of smaller scale problem from the original data, has become an emerging technique for big data analytics. With the sketching technique, we can find solutions which approximately solve various forms of original large-scale problem, such as least square regression, robust regression, low-rank approximation, singular value decomposition, just to name a few. For survey and recent advances about sketching, we refer the readers to (Halko et al., 2011; Mahoney, 2011; Lu et al., 2013; Alaoui and Mahoney, 2014; Woodruff, 2014; Raskutti and Mahoney, 2015; Yang et al., 2015a; Oymak et al., 2015; Oymak and Tropp, 2015; Drineas and Mahoney, 2016) and references therein.
However, one major drawback of sketching is that typically it’s not suitable for the case if we want high accurate solution: to obtain a solution with exponentially smaller approximation error, we often need to increase the sketching dimension also exponentially.
The situation has become better with recent work on “iterative sketch”, e.g. iterative Hessian sketch (IHS) (Pilanci and Wainwright, 2016) and iterative dual random projection (IDRP) (Zhang et al., 2014). These methods are able to refine their approximate solution by iteratively solving some small scale sketched problem. Among these innovations, Hessian sketch (Pilanci and Wainwright, 2016) is designed by reducing the sample size of the original problem, while dual random projection (Zhang et al., 2014) is proposed by reducing the dimension. As a consequence, when the sample size and feature dimension are both large, IHS and IDRP still need to solve relatively large-scale subproblems as they can only sketch the problem from one perspective. In this paper, we make the following improvement upon previous work: we first propose an accelerated version of IHS which requires the same computational cost to solve the IHS subproblem at each sketching iteration, while with provably fewer number of sketching iterations to reach certain accuracy; we then reveal the primal-dual connections between IHS (Pilanci and Wainwright, 2016) and IDRP (Zhang et al., 2014), which are independently proposed by two different groups of researchers. In particular, we show that these two methods are equivalent in the sense that dual random projection is performing Hessian sketch in the dual space. Finally, to alleviate the computational issues raised by big and high-dimensional learning problems, we propose a primal-dual sketching method that can simultaneously reduce the sample size and dimension of the sketched sub-problem, with provable convergence guarantees.
Here is the paper: Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data by Jialei Wang, Jason D. Lee, Mehrdad Mahdavi, Mladen Kolar, Nathan Srebro
Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data. In this paper, we study sketching from an optimization point of view: we first show that the iterative Hessian sketch is an optimization process with preconditioning, and develop accelerated iterative Hessian sketch via the searching the conjugate direction; we then establish primal-dual connections between the Hessian sketch and dual random projection, and apply the preconditioned conjugate gradient approach on the dual problem, which leads to the accelerated iterative dual random projection methods. Finally to tackle the challenges from both large sample size and high-dimensionality, we propose the primal-dual sketch, which iteratively sketches the primal and dual formulations. We show that using a logarithmic number of calls to solvers of small scale problem, primal-dual sketch is able to recover the optimum of the original problem up to arbitrary precision. The proposed algorithms are validated via extensive experiments on synthetic and real data sets which complements our theoretical results.
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.