Nuit Blanche: PhD Thesis: Efficient Methods and Hardware for Deep Learning by Song Han

Thursday, October 12, 2017

PhD Thesis: Efficient Methods and Hardware for Deep Learning by Song Han

Congratulations Dr. Han !

The thesis is listed below but many of themes are part of lecture 15 of Stanford CS 231n's Convolutional Neural Networks for Visual Recognition (Spring 2017 ). Enjoy !

Efficient Methods and Hardware for Deep Learning by Song Han

The future will be populated with intelligent devices that require inexpensive, low-power hardware platforms. Deep neural networks have evolved to be the state-of-the-art technique for machine learning tasks. However, these algorithms are computationally intensive, which makes it difficult to deploy on embedded devices with limited hardware resources and a tight power budget. Since Moore's law and technology scaling are slowing down, technology alone will not address this issue. To solve this problem, we focus on efficient algorithms and domain-specific architectures specially designed for the algorithm. By performing optimizations across the full stack from application through hardware, we improved the efficiency of deep learning through smaller model size, higher prediction accuracy, faster prediction speed, and lower power consumption. Our approach starts by changing the algorithm, using "Deep Compression" that significantly reduces the number of parameters and computation requirements of deep learning models by pruning, trained quantization, and variable length coding. "Deep Compression" can reduce the model size by 18x to 49x without hurting the prediction accuracy. We also discovered that pruning and the sparsity constraint not only applies to model compression but also applies to regularization, and we proposed dense-sparse-dense training (DSD), which can improve the prediction accuracy for a wide range of deep learning models. To efficiently implement "Deep Compression" in hardware, we developed EIE, the "Efficient Inference Engine", a domain-specific hardware accelerator that performs inference directly on the compressed model which significantly saves memory bandwidth. Taking advantage of the compressed model, and being able to deal with the irregular computation pattern efficiently, EIE improves the speed by 13x and energy efficiency by 3,400x over GPU.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !