Nuit Blanche: Sunday Morning Insight: The Regularization Architecture

Saturday, December 20, 2014

Sunday Morning Insight: The Regularization Architecture

Or Not ?

Today's insights go in different directions. The first one came in large part from a remark by John Platt on Technet's Machine Learning blog who puts this into words much better than I ever could.

...Given the successes of deep learning, researchers are trying to understand how they work. Ba and Caruana had a NIPS paper which showed that, once a deep network is trained, a shallow network can learn the same function from the outputs of the deep network. The shallow network can’t learn the same function directly from the data. This indicates that deep learning could be an optimization/learning trick...

My emphasis on the last sentence. So quite clearly, the model reduction of Jimmy Ba and Rich Caruana that was featured a year ago (Do Deep Nets Really Need to be Deep?, NIPS version Do Deep Nets Really Need to be Deep?) or the more recently featured shallow model (Kernel Methods Match Deep Neural Networks on TIMIT) do in fact point to a potentially better regularization scheme that we have not really found.

If in fact, one wants to continue to think in terms of several layers, then one could think of these stacked networks as iterations of a reconstruction solver as Christian Schuler, Michael Hirsch, Stefan Harmeling, and Bernhard Scholko pf do in Learning to Deblur.

Finally, in yesterday's videos (Saturday Morning Videos : Semidefinite Optimization, Approximation and Applications (Simons Institute @ Berkeley)), one could watch Sanjeev Arora [3] talk about Adventures in Linear Algebra++ and Unsupervised Learning (slides). It so happens that what he describes as Linear Algebra ++ is none other than the Advanced Matrix Factorization Jungle. But more importantly, he mentions that randomly wired deep nets (Provable Bounds for Learning Some Deep Representations)

were acknowledged in the building of a recent deeper network paper [1]. From [1]:

In general, one can view the Inception model as a logical culmination of [12] while taking inspiration and guidance from the theoretical work by Arora et al [2]. The benefits of the architecture are experimentally verified on the ILSVRC 2014 classification and detection challenges, on which it significantly outperforms the current state of the art.

and in the conclusion of that same paper:

Although it is expected that similar quality of result can be achieved by much more expensive networks of similar depth and width, our approach yields solid evidence that moving to sparser architectures is feasible and useful idea in general. This suggest promising future work towards creating sparser and more refined structures in automated ways on the basis of [2].

Deeper constructs yes, but sparser ones.

References:

[1] Going Deeper with Convolutions by Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich

We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

[2] Learning to Deblur by Christian J. Schuler, Michael Hirsch, Stefan Harmeling, and Bernhard Scholkopf

We show that a neural network can be trained to blindly deblur images. To accomplish that, we apply a deep layered architecture, parts of which are borrowed from recent work on neural network learning, and parts of which incorporate computations that are specific to image deconvolution. The system is trained end-to-end on a set of artificially generated training examples, enabling competitive performance in blind deconvolution, both with respect to quality and runtime.

[3] Sanjeev Arora , Adventures in Linear Algebra++ and Unsupervised Learning