Nuit Blanche: Beyond Overfitting and Beyond Silicon: The double descent curve

Wednesday, January 15, 2020

Beyond Overfitting and Beyond Silicon: The double descent curve

** Nuit Blanche is now on Twitter: @NuitBlog **

We recently tried a small experiment with LightOn's Optical Processing Unit on the issue of generalization. Our engineer, Alessandro Cappelli, did the experiment and wrote a blog post on it and it is here: Beyond Overfitting and Beyond Silicon: The double descent curve

Two days ago, Becca Willett was talking on the same subject at the Turing Institute in London.

A function space view of overparameterized neural networks Rebecca Willett.

Attendant preprint is here:

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case by Greg Ongie, Rebecca Willett, Daniel Soudry, Nathan Srebro

A key element of understanding the efficacy of overparameterized neural networks is characterizing how they represent functions as the number of weights in the network approaches infinity. In this paper, we characterize the norm required to realize a function f:Rd→R as a single hidden-layer ReLU network with an unbounded number of units (infinite width), but where the Euclidean norm of the weights is bounded, including precisely characterizing which functions can be realized with finite norm. This was settled for univariate univariate functions in Savarese et al. (2019), where it was shown that the required norm is determined by the L1-norm of the second derivative of the function. We extend the characterization to multivariate functions (i.e., networks with d input units), relating the required norm to the L1-norm of the Radon transform of a (d+1)/2-power Laplacian of the function. This characterization allows us to show that all functions in Sobolev spaces Ws,1(R), s≥d+1, can be represented with bounded norm, to calculate the required norm for several specific functions, and to obtain a depth separation result. These results have important implications for understanding generalization performance and the distinction between neural networks and more traditional kernel learning.

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn or the Advanced Matrix Factorization group on LinkedIn