Department of Electrical Engineering
National Tsing Hua University, Taiwan, R.O.C.
Abstract
In the era of artificial
intelligence, convolutional neural networks (CNNs) are emerging as a
powerful technique for computational imaging. They have shown superior
quality for reconstructing fine textures from badly-distorted images and
have potential to bring next-generation cameras and displays to our
daily life. However, CNNs demand intensive computing power for
generating high-resolution videos and defy conventional sparsity
techniques when rendering dense details. Therefore, finding new
possibilities in regular sparsity is crucial to enable large-scale
deployment of CNN-based computational imaging.
In this paper, we
consider a fundamental but yet well-explored approach—algebraic
sparsity—for energy-efficient CNN acceleration. We propose
to build CNN models based on ring algebra that defines
multiplication, addition, and non-linearity for n-tuples
properly. Then the essential sparsity will immediately follow,
e.g. n-times reduction for the number of real-valued weights. We
define and unify several variants of ring algebras into a
modeling framework, RingCNN, and make comparisons in
terms of image quality and hardware complexity. On top of that,
we further devise a novel ring algebra which minimizes
complexity with component-wise product and achieves the best
quality using directional ReLU. Finally, we design an
accelerator, eRingCNN, to accommodate to the proposed
ring algebra, in particular with regular ring-convolution arrays
for efficient inference and on-the-fly directional ReLU blocks
for fixed-point computation. We implement two configurations,
n=2 and 4 (50% and 75% sparsity), with 40 nm technology to
support advanced denoising and super-resolution at up to 4K UHD
30 fps. Layout results show that they can deliver equivalent
41 TOPS using 3.76 W and 2.22 W, respectively. Compared to the
real-valued counterpart, our ring convolution engines for n=2
achieve 2.00x energy efficiency and 2.08x area efficiency with
similar or even better image quality. With n=4, the efficiency
gains of energy and area are further increased to 3.84x and
3.77x with only 0.11 dB drop of peak signal-to-noise ratio
(PSNR). The results show that RingCNN exhibits great
architectural advantages for providing near-maximum hardware
efficiencies and graceful quality degradation simultaneously.
Publication
C.-T. Huang,
"RingCNN: Exploiting Algebraically-Sparse Ring Tensors for
Energy-Efficient CNN-Based Computational Imaging," in ACM/IEEE
International Symposium on Computer Architecture (ISCA), 2021. [arXiv:2104.09056]
ISCA Presentation (20 minutes)
Acknowledgement
This work was
supported by the Ministry of Science and Technology, Taiwan,
R.O.C., under Grant no. MOST 109-2218-E-007-034.