Nuit Blanche: L_1 regularization in Machine Learning: Memory Bounded Deep Convolutional Networks / Sparse Random Features Algorithm as Coordinate Descent in Hilbert Space

Wednesday, December 10, 2014

L_1 regularization in Machine Learning: Memory Bounded Deep Convolutional Networks / Sparse Random Features Algorithm as Coordinate Descent in Hilbert Space

What happens when you use l_1 regularization on Deep Convolutional Networks or Large Scale Kernel Machines ? This is the subject of inquiry of the following two papers, one of which will be presented as a poster tonight in Montreal at NIPS2014.

Sparse Random Features Algorithm as Coordinate Descent in Hilbert Space by Ian E.H. Yen, Ting-Wei Lin, Shou-De Lin, Pradeep Ravikumar, Inderjit S. Dhillon

In this paper, we propose a Sparse Random Features algorithm, which learns a sparse non-linear predictor by minimizing an ℓ1-regularized objective function over the Hilbert Space induced from a kernel function. By interpreting the algorithm as Randomized Coordinate Descent in an infinite-dimensional space, we show the proposed approach converges to a solution within ε-precision of that using an exact kernel method, by drawing O(1/ε) random features, in contrast to the O(1/ε2) convergence achieved by current Monte-Carlo analyses of Random Features. In our experiments, the Sparse Random Feature algorithm obtains a sparse solution that requires less memory and prediction time, while maintaining comparable performance on regression and classification tasks. Moreover, as an approximate solver for the infinite-dimensional ℓ1-regularized problem, the randomized approach also enjoys better convergence guarantees than a Boosting approach in the setting where the greedy Boosting step cannot be performed exactly.

Memory Bounded Deep Convolutional Networks by Maxwell D. Collins, Pushmeet Kohli

In this work, we investigate the use of sparsity-inducing regularizers during training of Convolution Neural Networks (CNNs). These regularizers encourage that fewer connections in the convolution and fully connected layers take non-zero values and in effect result in sparse connectivity between hidden units in the deep network. This in turn reduces the memory and runtime cost involved in deploying the learned CNNs. We show that training with such regularization can still be performed using stochastic gradient descent implying that it can be used easily in existing codebases. Experimental evaluation of our approach on MNIST, CIFAR, and ImageNet datasets shows that our regularizers can result in dramatic reductions in memory requirements. For instance, when applied on AlexNet, our method can reduce the memory consumption by a factor of four with minimal loss in accuracy.

Join the CompressiveSensing subreddit or the Google+ Community and post there !