Optimizing the Bit Allocation for Compression of Weights and Activations of Deep Neural Networks

Optimizing the Bit Allocation for Compression of Weights and Activations of Deep Neural Networks
Title:
Optimizing the Bit Allocation for Compression of Weights and Activations of Deep Neural Networks
Other Titles:
2019 IEEE International Conference on Image Processing (ICIP)
Publication Date:
22 September 2019
Citation:
W. Zhe, J. Lin, V. Chandrasekhar and B. Girod, "Optimizing the Bit Allocation for Compression of Weights and Activations of Deep Neural Networks," 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 3826-3830. doi: 10.1109/ICIP.2019.8803498
Abstract:
For most real-time implementations of deep artificial neural networks, both weights and intermediate layer activations have to be stored and loaded for processing. Compression of both is often advisable to mitigate the memory bottleneck. In this paper, we propose a bit allocation framework for compressing the weights and activations of deep neural networks. The differentiability of input-output relationships for all network layers allows us to relate the neural network output accuracy to the bit-rate of quantized weights and layer activations. We formulate a Lagrangian optimization framework that finds the optimum joint bit allocation among all intermediate activation layers and weights. Our method obtains excellent results on two deep neural networks, VGG-16 and ResNet-50. Without requiring re-training, it outperforms other state-of-the-art neural network compression methods.
License type:
PublisherCopyrights
Funding Info:
Description:
(c) 2019 IEEE.
ISSN:
2381-8549
1522-4880
Files uploaded:
File Size Format Action
There are no attached files.