Chong, Y. S., Harish, R., Panicker, R. C., Nambiar, V. P., & Do, A. T. (2024). A 420 GOPS/W CGRA with a Configurable MAC and Dynamic Truncation. 2024 IEEE International Symposium on Circuits and Systems (ISCAS), 1–5. https://doi.org/10.1109/iscas58744.2024.10558192
Abstract:
Edge devices demand for highly efficient yet flexible processing capability to handle dynamic real-time workloads. Coarse grain reconfigurable architecture (CGRA) emerges as a suitable accelerator candidate in edge devices, because they are as flexible as general purpose processors and offer high efficiency close to that of domain specific accelerators. However, a typical CGRA requires two cycles for a multiply-and-accumulate (MAC) operation, and workloads such as neural network inference and signal processing involve many MAC operations, resulting in long CGRA processing time. This work proposes a CGRA that has configurable MAC units in the processing elements (PEs) that can perform an addition (ADD) or multiplication (MUL) or a MAC by using the same multiplier and adder, in a single cycle. The readout precision of MAC result can be adjusted by a truncation block. The proposed CGRA is implemented with 40nm CMOS technology. It attains an energy efficiency of 420.6GOPS/W operating at supply of 0.6V and frequency of 21MHz, which is 1.4 times higher than the state-of-the-art.
License type:
Publisher Copyright
Funding Info:
This research / project is supported by the National Research Foundation (NRF) - Competitive Research Programme
Grant Reference no. : CRP23-2019-0003