How could I have missed this one from the papers currently in review for ICLR 2015 ? In fact, I did not miss it, I read it and then ... other things took over. So without further ado, here is the starting point of the study:
...Consider, however, the results shown in Figure 1, where we trained networks of increasing size on the MNIST and CIFAR-10 datasets. Training was done using stochastic gradient descent with momentum and diminishing step sizes, on the training error and without any explicit regularization. As expected, both training and test error initially decrease. More surprising is that if we increase the size of the network past the size required to achieve zero training error, the test error continues decreasing! This behavior is not at all predicted by, and even contrary to, viewing learning as fitting a hypothesis class controlled by network size...
and then they use advanced matrix factorization to understand the issue better, what's not to like?
In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning by Behnam Neyshabur, Ryota Tomioka, Nathan Srebro.
We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.
From the same authors:
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.