I wanted to include more ICLR 2017 submissions  with the following paper as regards to how we should size neural networks, but as the deadline nears, the website is not responsive. I'll get back to some of these submissions later. In the meantime, here is this very interesting paper. The second section is a great summary of where we stand in terms of architectures. Enjoy ! (And good luck for those of you frantically submitting to ICLR).
Lets keep it simple: using simple architectures to outperform deeper architectures by Seyyed Hossein HasanPour, Mohammad Rouhani, Javad Vahidi
In recent years, all winning architectures that achieved state of the art results have been very deep and had parameters ranging from tens to hundreds of millions. While an optimal depth has yet to be discovered, it is known that these deep architectures are far from being optimal. Very deep and heavy architecture such as VGGNet, GoogleNet, ResNet and the likes, are very demanding in terms of hardware requirements, and so their practical use has become very limited and costly. The computational and memory overhead caused by such architectures has a negative effect on the expansion of methods and applications utilizing deep architectures. In order to overcome this issues, some thinned architectures such as Squeezenet are proposed that are computationally light-weight and useful for embedded systems. However their usage is also hindered by the low accuracy they provide. While deep architectures do provide good accuracy, we empirically show that a well-crafted yet simple and reasonably deep architecture can equally perform. This allows for more practical uses, especially in embed systems, or systems with computational and memory limitations. In this work, we present a very simple fully convolutional network with 13 layers that outperforms almost all deeper architectures to date such as ResNet, GoogleNet,WRN, etc with 2 to 25 times fewer number of parameters, and rarely when it does not supersede an architecture, it performs on par. We achieved state of the art results and very close to it on datasets such as CIFAR10/100, MNIST and SVHN with simple or no data-augmentation.

 
2 comments:
3 typos in the abstract and a Microsoft Word-formatted document... This doesn't inspire much confidence !
It seems that was an early/incomplete draft which was also updated quickly after several days.
It is now much more through than it was initially, suggesting either the guy didn't know how Arxiv works in first place or somehow it got published before the he (anyone responsible for publishing it) realizes it!
Anyway, the latest version looks promising IMHO though.
Post a Comment