Nuit Blanche: Hardware Based Stochastic Gradient Descent

Tuesday, September 16, 2014

Hardware Based Stochastic Gradient Descent

Starting at 52 minutes and 10 seconds of this video, Ali Rahimi explains how one can use a noisy evaluation of the gradient in a gradient descent algorithm in order to minimize a convex function (most relaxations featured in current Advanced Matrix Factorization techniques and some instances of Compressive Sensing fall in that category). The interesting part of this is that the noise comes from the errors made by the, now stochastic, Floating Point Unit. From the paper below:

Unlike the traditional setting for stochastic gradient descent, where stochasticity arises because the gradient direction is computed from a random subset of a dataset, here the processor itself is the source of stochasticity. We call our approach application robustification.

The attendant paper mentioned in the video for the Intel chip is here: A numerical optimization-based methodology for application robustification: Transforming applications for error tolerance by Joseph Sloan, David Kesler, Rakesh Kumar, Ali Rahimi

There have been several attempts at correcting process variation induced errors by identifying and masking these errors at the circuit and architecture level. These approaches take up valuable die area and power on the chip. As an alternative, we explore the feasibility of an approach that allows these errors to occur freely, and handle them in software, at the algorithmic level. In this paper, we present a general approach to converting applications into an error tolerant form by recasting these applications as numerical optimization problems, which can then be solved reliably via stochastic optimization. We evaluate the potential robustness and energy benefits of the proposed approach using an FPGA-based framework that emulates timing errors in the floating point unit (FPU) of a Leon3 processor. We show that stochastic versions of applications have the potential to produce good quality outputs in the face of timing errors under certain assumptions. We also show that good quality results are possible for both intrinsically robust algorithms as well as fragile applications under these assumptions.

Of note, these two theses: