Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks by Philipp Gysel
Convolutional neural networks (CNN) have achieved major breakthroughs in recent years. Their performance in computer vision have matched and in some areas even surpassed human capabilities. Deep neural networks can capture complex non-linear features; however this ability comes at the cost of high computational and memory requirements. State-of-art networks require billions of arithmetic operations and millions of parameters. To enable embedded devices such as smartphones, Google glasses and monitoring cameras with the astonishing power of deep learning, dedicated hardware accelerators can be used to decrease both execution time and power consumption. In applications where fast connection to the cloud is not guaranteed or where privacy is important, computation needs to be done locally. Many hardware accelerators for deep neural networks have been proposed recently. A first important step of accelerator design is hardware-oriented approximation of deep networks, which enables energy-efficient inference. We present Ristretto, a fast and automated framework for CNN approximation. Ristretto simulates the hardware arithmetic of a custom hardware accelerator. The framework reduces the bit-width of network parameters and outputs of resource-intense layers, which reduces the chip area for multiplication units significantly. Alternatively, Ristretto can remove the need for multipliers altogether, resulting in an adder-only arithmetic. The tool fine-tunes trimmed networks to achieve high classification accuracy. Since training of deep neural networks can be time-consuming, Ristretto uses highly optimized routines which run on the GPU. This enables fast compression of any given network. Given a maximum tolerance of 1%, Ristretto can successfully condense CaffeNet and SqueezeNet to 8-bit. The code for Ristretto is available.
The page for the Ristretto project is at: http://lepsucd.com/?page_id=621
From the page:
Ristretto is an automated CNN-approximation tool which condenses 32-bit floating point networks. Ristretto is an extention of Caffe and allows to test, train and finetune networks with limited numerical precision.
Ristretto In a Minute
- Ristretto Tool: The Ristretto tool performs automatic network quantization and scoring, using different bit-widths for number representation, to find a good balance between compression rate and network accuracy.
- Ristretto Layers: Ristretto reimplements Caffe-layers with quantized numbers.
- Testing and Training: Thanks to Ristretto’s smooth integration into Caffe, network description files can be manually changed to quantize different layers. The bit-width used for different layers as well as other parameters can be set in the network’s prototxt file. This allows to directly test and train condensed networks, without any need of recompilation.
Approximation Schemes
Ristretto allows for three different quantization strategies to approximate Convolutional Neural Networks:
- Dynamic Fixed Point: A modified fixed-point format with more flexibility.
- Mini Floating Point: Bit-width reduced floating point numbers.
- Power-of-two parameters: Layers with power-of-two parameters don’t need any multipliers, when implemented in hardware.
Documentation
- SqueezeNet Example: Quantization, fine-tuning and benchmarking of SqueezeNet.
- Ristretto Layers, Benchmarking and Finetuning: Implementation details of Ristretto.
- Approximation Schemes
- Ristretto Layer Catalogue: List of layers that can be approximated by Ristretto.
- Ristretto Tool: The command line tool and its parameters.
- ristretto-users group: Join our Google group to ask questions about Ristretto’s features or report issues. Please try this forum first before sending us an Email.
- Tips and Tricks
Cite us
Our approximation framework was presented in an extended abstract at ICLR’16. Check out our poster. All results can be reproduced with our code on Github. If Ristretto helps your research project, please cite us:
- @article{gysel2016hardware,
- title={Hardware-oriented Approximation of Convolutional Neural Networks},
- author={Gysel, Philipp and Motamedi, Mohammad and Ghiasi, Soheil},
- journal={arXiv preprint arXiv:1604.03168},
- year={2016}
- }

 
No comments:
Post a Comment