Nuit Blanche: Training Very Deep Networks

Wednesday, July 29, 2015

Training Very Deep Networks - implementation -

As each layer of a deep neural network can be viewed as an iteration step of a reconstruction solver, one wonders if and how to design them so that they more generally fit the generic behavior of traditional solvers (which use many iterations). In turn this may provide some insight on how to design reconstruction solvers (i.e. allow some information from far away iteration to come back into the loop). Here is the beginning of an answer today in the following preprint:

Training Very Deep Networks by Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber

Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.

Some implementation of this algorithm can be found here at: http://people.idsia.ch/~rupesh/very_deep_learning/

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !