BASER: Bit-Wise Approximate Compressor Configurable In-SRAM-Computing for Energy-Efficient Neural Network Acceleration With Data-Aware Weight Remapping Method

  • SHUNQIN CAI 1 ,
  • LIUKAI XU 1 ,
  • DENGFENG WANG 1 ,
  • ZHI LI 1 ,
  • WEIKANG QIAN 2 ,
  • LIANG CHANG 3 ,
  • YANAN SUN 1
Expand
  • 1 Department of Micro-Nano Electronics, Shanghai Jiao Tong University, Shanghai 200240, China
  • 2 Univerisity of Michigan-Shanghai Jiao Tong University Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China
  • 3 University of Electronic Science and Technology of China, Chengdu 610056, China

# Shunqin Cai and Liukai Xu contributed equally to this work

WEIKANG QIAN, (Senior Member, IEEE)

LIANG CHANG, (Member, IEEE)

YANAN SUN, (Senior Member, IEEE)

Received date: 2024-02-28

  Accepted date: 2024-05-15

  Online published: 2024-11-27

Supported by

National Key R&D Program of China under Grant(2023YFB4502200)

National Natural Science Foundation of China under Grant(62174110)

National Natural Science Foundation of China under Grant(62104025)

Natural Science Foundation of Shanghai under Grant(23ZR1433200)

Abstract

SRAM-based computing-in-memory (SRAM-CIM) is expected to solve the “Memory Wall” problem. For the digital domain SRAM-CIM, full-precision digital logic has been utilized to achieve high computational accuracy. However, the energy and area efficiency advantages of CIM cannot be fully utilized under error-resilient neural networks (NNs) with given quantization bit-width. Therefore, an all-digital Bit-wise Approximate compressor configurable In-SRAM-computing macro for Energy-efficient NN acceleration, with a data-aware weight Remapping method (BASER), is proposed in this paper. Leveraging the NN error resilience property, six energy-efficient bit-wise compressor configurations are presented under 4b/4b and 3b/3b NN quantization, respectively. Concurrently, a data-aware weight remapping approach is proposed to enhance the NN accuracy without supplementary retraining further. Evaluations of VGG-9 and ResNet-18 on CIFAR-10 and CIFAR-100 datasets show that the proposed BASER achieves 1.35x and 1.29x improvement in energy efficiency, as well as limited accuracy loss and improved NN accuracy, as compared to the previous full-precision and approximate SRAM-CIM design, respectively.

Cite this article

SHUNQIN CAI , LIUKAI XU , DENGFENG WANG , ZHI LI , WEIKANG QIAN , LIANG CHANG , YANAN SUN . BASER: Bit-Wise Approximate Compressor Configurable In-SRAM-Computing for Energy-Efficient Neural Network Acceleration With Data-Aware Weight Remapping Method[J]. Integrated Circuits and Systems, 2024 , 1(2) : 80 -91 . DOI: 10.23919/ICS.2024.3419630

[1]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2016,pp. 770- 778.

[2]
C. Szegedy et al. , “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2015, pp. 1- 9.

[3]
D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of deep learning for natural language processing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 2, pp. 604- 624, Feb. 2021.

[4]
T. Young, D. Hazarika, S. Poria, and E. Cambria, “Recent trends in deep learning based natural language processing,” IEEE Comput. Intell. Mag., vol. 13, no. 3, pp. 55- 75, Aug. 2018.

[5]
A. Canziani, E. Culurciello, and A. Paszke, “An analysis of deep neural network models for practical applications,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2017, pp. 1- 7.

[6]
S. Zhaopeng, Z. Kuanjiu, C. Kai, and H. Shaoqi, “PCIE-based high-performance FPGA-GPU-CPU heterogeneous communication method,” in Proc. IEEE Int. Workshop Electron. Commun. Artif. Intell. , 2020, pp. 66- 73.

[7]
Q. Dong et al. , “A 351TOPS/W and 372.4GOPS compute-in-memory SRAM macro in 7nm FinFET CMOS for machine-learning applications,” in Proc. IEEE Int. Solid-State Circuits Conf. , 2020, pp. 242- 244.

[8]
Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An in-memory- computing SRAM macro based on robust capacitive coupling computing mechanism,” IEEE J. Solid-State Circuits, vol. 55, no. 7,pp. 1888- 1897, Jul. 2020.

[9]
Y.-D. Chih et al. , “An 89TOPS/W and 16.3TOPS/mm 2 all-digital SRAM-based full-precision compute-in memory macro in 22nm for machine-learning edge applications,” in Proc. IEEE Int. Solid-State Circuits Conf. , 2021, pp. 252- 254.

[10]
H. Fujiwara et al. , “A 5-nm 254-TOPS/W 221-TOPS/mm 2 fully-digital computing-in-memory macro supporting wide-range dynamic-voltage- frequency scaling and simultaneous MAC and write operations,” in Proc. IEEE Int. Solid-State Circuits Conf. , 2022, pp. 1- 3.

[11]
D. Wang, C.-T. Lin, G. K. Chen, P. Knag, R. K. Krishnamurthy, and M. Seok, “DIMC: 2219TOPS/W 2569f 2/b digital in-memory computing macro in 28nm based on approximate arithmetic hardware,” in Proc. IEEE Int. Solid-State Circuits Conf. , 2022, pp. 266- 268.

[12]
C. He et al. , “LSAC: A low-power adder tree for digital computing-in- memory by sparsity and approximate circuits co-design,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 71, no. 2, pp. 852- 856, Feb. 2024.

[13]
S. Lee et al. , “A 1ynm 1.25V 8Gb, 16Gb/s/pin GDDR6-based accelerator-in-memory supporting 1TFLOPS MAC operation and various activation functions for deep-learning applications,” in Proc. IEEE Int. Solid-State Circuits Conf. , 2022, pp. 1- 3.

[14]
Y. Feng et al. , “Design-technology co-optimizations (DTCO) for general-purpose computing in-memory based on 55nm NOR flash technology,” in Proc. IEEE Int. Electron Devices Meeting , 2021, pp. 12.1. 1- 12.1.4.

[15]
W. Li, X. Sun, S. Huang, H. Jiang, and S. Yu, “A 40-nm MLC-RRAM compute-in-memory macro with sparsity control, on-chip write-verify, and temperature-independent ADC references,” IEEE J. Solid-State Circuits, vol. 57, no. 9, pp. 2868- 2877, Sep. 2022.

[16]
Z. Song et al. , “ITT-RNA: Imperfection tolerable training for RRAM- crossbar-based deep neural-network accelerator,” IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., vol. 40, no. 1, pp. 129- 142, Jan. 2021.

[17]
Z. Li et al. , “Adaptable approximate multiplier design based on input distribution and polarity,” IEEE Trans. Very Large Scale Integration Syst., vol. 30, no. 12, pp. 1813- 1826, Dec. 2022.

[18]
F. Zhu, S. Zhen, X. Yi, H. Pei, B. Hou, and Y. He, “Design of approximate radix-256 booth encoding for error-tolerant computing,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 69, no. 4, pp. 2286- 2290, Apr. 2022.

[19]
T. Zhang et al. , “Design of majority logic-based approximate booth multipliers for error-tolerant applications,” IEEE Trans. Nanotechnol., vol. 21, pp. 81- 89, 2022.

[20]
T. Alan and J. Henkel, “Probability-driven evaluation of lower-part approximation adders,” IEEE Trans. Circuits Syst. II: Exp. Briefs, vol. 69, no. 1, pp. 204- 208, Jan. 2022.

[21]
K.-L. Tsai, Y.-J. Chang, C.-H. Wang, and C.-T. Chiang, “Accuracy- configurable radix-4 adder with a dynamic output modification scheme,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 68, no. 8,pp. 3328- 3336, Aug. 2021.

[22]
A. Qureshi and O. Hasan, “Formal probabilistic analysis of low latency approximate adders,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 38, no. 1, pp. 177- 189, Jan. 2019.

[23]
M. Rezaalipour, M. Rezaalipour, M. Dehyadegari, and M. N. Bojnordi, “AxMAP: Making approximate adders aware of input patterns,” IEEE Trans. Comput., vol. 69, no. 6, pp. 868- 882, Jun. 2020.

[24]
Y. Wu et al. , “An energy-efficient approximate divider based on logarithmic conversion and piecewise constant approximation,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 69, no. 7, pp. 2655- 2668, Jul. 2022.

[25]
F. Tu et al. , “ReDCIM: Reconfigurable digital computing-in-memory processor with unified FP/INT pipeline for cloud AI acceleration,” IEEE J. Solid-State Circuits, vol. 58, no. 1, pp. 243- 255, Jan. 2023.

[26]
W. Liu, L. Qian, C. Wang, H. Jiang, J. Han, and F. Lombardi, “Design of approximate radix-4 booth multipliers for error-tolerant computing,” IEEE Trans. Comput., vol. 66, no. 8, pp. 1435- 1441, Aug. 2017.

[27]
S. Venkatachalam, E. Adams, H. J. Lee, and S.-B. Ko, “Design and analysis of area and power efficient approximate booth multipliers,” IEEE Trans. Comput., vol. 68, no. 11, pp. 1697- 1703, Nov. 2019.

[28]
M. Ahmadinejad et al. , “Energy and area efficient imprecise com- pressors for approximate multiplication at nanoscale,” Int. J. Electron. Commun, vol. 110, Oct. 2019, Art. no. 152859.

[29]
O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram, “Dual-quality 4:2 compressors for utilizing in dynamic accuracy configurable multi- pliers,” IEEE Trans. Very Large Scale Integration Syst., vol. 25, no. 4,pp. 1352- 1361, Apr. 2017.

[30]
Z. Yang, J. Han, and F. Lombardi, “Approximate compressors for error-resilient multiplier design,” in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Nanotechnol. Syst. , 2015, pp. 183- 186.

[31]
A. G. M. Strollo, E. Napoli, D. De Caro, N. Petra, and G. D. Meo, “Comparison and extension of approximate 4-2 compressors for low- power approximate multipliers,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 67, no. 9, pp. 3021- 3034, Sep. 2020.

[32]
C.-H. Lin and I. -C. Lin, “High accuracy approximate multiplier with error correction,” in Proc. IEEE 31st Int. Conf. Comput. Des., 2013,pp. 33- 38.

[33]
A. G. M. Strollo, E. Napoli, D. De Caro, N. Petra, G. Saggese, and G. Di Meo, “Approximate multipliers using static segmentation: Error analysis and improvements,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 69, no. 6, pp. 2449- 2462, Jun. 2022.

[34]
M. S. Ansari, B. F. Cockburn, and J. Han, “An improved logarithmic multiplier for energy-efficient neural computing,” IEEE Trans. Comput., vol. 70, no. 4, pp. 614- 625, Apr. 2021.

[35]
R. Pilipovic´, P. Bulic´, and U. Lotricˇ, “A two-stage operand trimming approximate logarithmic multiplier,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 68, no. 6, pp. 2535- 2545, Jun. 2021.

[36]
X. Wang and W. Qian, “MinAC: Minimal-area approximate compressor design based on exact synthesis for approximate multipliers,” in Proc. IEEE Int. Symp. Circuits Syst. , 2022, pp. 677- 681.

[37]
X. Fan, T. Zhang, H. Li, H. Liu, S. Lu, and J. Han, “DACA: Dynamic accuracy-configurable adders for energy-efficient multiprecision computing,” IEEE Trans. Nanotechnol., vol. 22, pp. 400- 408, 2023.

[38]
D. Wang, Z. Li, C. Chang, W. He, and Y. Sun, “All-digital full-precision in-SRAM computing with reduction tree for energy-efficient MAC op- erations,” in Proc. IEEE Int. Conf. Integr. Circuits, Technol. Appl. , 2022, pp. 150- 151.

[39]
F. S. Hosseini et al. , “Tolerating defects in low-power neural network accelerators via retraining-free weight approximation,” ACM Trans. Embedded Comput. Syst., vol. 20, no. 5s, pp. 1- 21, Sep. 2021.

Outlines

/