As each layer of a deep neural network can be viewed as an iteration step of a reconstruction solver, one wonders if and how to design them so that they more generally fit the generic behavior of traditional solvers (which use many iterations). In turn this may provide some insight on how to design reconstruction solvers (i.e. allow some information from far away iteration to come back into the loop). Here is the beginning of an answer today in the following preprint:
Training Very Deep Networks by Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber
Theoretical and empirical evidence indicates that the depth of neural networks is crucial for their success. However, training becomes more difficult as depth increases, and training of very deep networks remains an open problem. Here we introduce a new architecture designed to overcome this. Our so-called highway networks allow unimpeded information flow across many layers on information highways. They are inspired by Long Short-Term Memory recurrent networks and use adaptive gating units to regulate the information flow. Even with hundreds of layers, highway networks can be trained directly through simple gradient descent. This enables the study of extremely deep and efficient architectures.
Some implementation of this algorithm can be found here at: http://people.idsia.ch/~rupesh/very_deep_learning/
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.
Post a Comment