W. Zhe, J. Lin, V. Chandrasekhar and B. Girod, "Optimizing the Bit Allocation for Compression of Weights and Activations of Deep Neural Networks," 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 2019, pp. 3826-3830. doi: 10.1109/ICIP.2019.8803498
Abstract:
For most real-time implementations of deep artificial neural networks, both weights and intermediate layer activations have to be stored and loaded for processing. Compression of both is often advisable to mitigate the memory bottleneck. In this paper, we propose a bit allocation framework for compressing the weights and activations of deep neural networks. The differentiability of input-output relationships for all network layers allows us to relate the neural network output accuracy to the bit-rate of quantized weights and layer activations. We formulate a Lagrangian optimization framework that finds the optimum joint bit allocation among all intermediate activation layers and weights. Our method obtains excellent results on two deep neural networks, VGG-16 and ResNet-50. Without requiring re-training, it outperforms other state-of-the-art neural network compression methods.