Nuit Blanche: Understanding deep learning requires rethinking generalization

Thursday, January 19, 2017

Understanding deep learning requires rethinking generalization

Here is an interesting paper that pinpoints the influence of regularization on learning with Neural networks. From the paper:

Our central finding can be summarized as:
Deep neural networks easily fit random labels.

and later:

While simple to state, this observation has profound implications from a statistical learning perspective:

1. The effective capacity of neural networks is large enough for a brute-force memorization of the entire data set.

2. Even optimization on random labels remains easy. In fact, training time increases only by a small constant factor compared with training on the true labels.

3. Randomizing labels is solely a data transformation, leaving all other properties of the learning problem unchanged.

One can also read the interesting comments on OpenReview and on Reddit.

Understanding deep learning requires rethinking generalization by Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training.
Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice.
We interpret our experimental findings by comparison with traditional models.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !