Special Section on Selected Papers from ASICON2023

Linearity Performance of Charge Domain In-Memory Computing: Analysis and Calibration

  • HENG ZHANG ,
  • YICHUAN BAI ,
  • JUNJIE SHEN ,
  • YUAN DU ,
  • LI DU
Expand
  • School of Electronic Science and Engineering, Nanjing University, Nanjing 210023, China

HENG ZHANG received the B.S. and M.S. degrees in microelectromechanical systems from North- western Polytechnical University, Xi’an, China, in 2018 and 2021, respectively. He is currently working toward the Ph.D. degree with the School of Electronic Science and Engineering, Nanjing University, Nanjing, China. His research interests include analog and mixed-signal circuit design for deep learning accelerators, and optical packet switching networks for data centers.

YICHUAN BAI (Graduate Student Member, IEEE) received the B.S. and M.S. degrees from the School of Electronic Science and Engineering, Nanjing University, Nanjing, China, in 2019 and 2022, respectively. He is currently working toward the Ph.D. degree in electronic science and tech- nology. His research interests include machine- learning hardware accelerators and neural network compilers.

JUNJIE SHEN majoring in electronic information engineering with the Nanjing University of Posts and Telecommunications, Nanjing, China. He will soon study for a graduate degree with the School of Electronic Science and Engineering, Nanjing University, Nanjing. His research interests include in-memory com- puting architecture and digital integrated circuits, and signal processing and FPGA implementations

YUAN DU (Senior Member, IEEE) received the B.S. degree from Southeast University, Nanjing, China, in 2009, the M.S. and Ph.D. degrees from the Electrical Engineering Department, University of California, Los Angeles, Los Angeles, CA, USA, in 2012 and 2016, respectively. Since 2019, he has been an Associate Professor with Nanjing University, Nanjing. From 2016 to 2019, he was a leading hardware Architect with Kneron Inc., San Diego, CA. His research interests include the de- signs of machine-learning hardware accelerators, high-speed inter-chip/intra-chip interconnects, and RFICs. He was the re- cipient of the Microsoft Research Asia Young Fellow in 2008, Southeast University Chancellor’s Award in 2009, Broadcom Young Fellow in 2015, and IEEE Circuits and Systems Society Darlington Best Paper Award in 2021.

LI DU (Member, IEEE) received the B.S. de- gree from Southeast University, Nanjing, China, in 2011, and the Ph.D. degree in electrical engineer- ing from the University of California, Los Angeles, Los Angeles, CA, USA, in 2016. From June 2013 to September 2016, he was with Qualcomm Inc., San Diego, CA, designing mixed-signal circuits for cellular communications. From September 2016 to October 2018, he was a Hardware Architect Re- search Scientist with Kneron Inc., San Diego, CA, designing high-performance artificial intelligence hardware accelerator. After that, he joined Xin Yun Tech Inc., Westlake, CA, in charge of high-speed analog circuits design for 100G/400G optical communication. He is currently an Associate Professor with the Depart- ment of Electrical Science and Engineering, Nanjing University. His research includes analog sensing circuit design, in-memory computing design, and high-performance AI processors for edge sensing.

Received date: 2024-03-04

  Revised date: 2024-04-11

  Accepted date: 2024-05-10

  Online published: 2024-11-27

Supported by

National Key Research and Development Program of China under Grant(2022YFB4400900)

Natural Science Foundation of China under Grant(62371223)

Abstract

Deep learning has recently gained significant prominence in various real-world applications such as image recognition, natural language processing, and autonomous vehicles. While deep neural networks appear to have different architectures, the main operations within these models are matrix-vector multiplications (MVM). Compute-in-memory (CIM) architectures are promising solutions for accelerating the massive MVM operations by alleviating the frequent data movement issue in traditional processors. Ana log CIM macros leverage current-accumulating or charge-sharing mechanisms to perform multiply-and-add (MAC) computations. Even though they can achieve high throughput and efficiency, the computing accuracy is sacrificed due to the analog nonidealities. To ensure precise MAC calculations, it is crucial to analyze the sources of nonidealities and identify their impacts, along with corresponding solutions. In this paper, comprehensive linearity analysis and dedicated calibration methods for charge domain static-random access memory (SRAM) based in-memory computing circuits are proposed. We analyze nonidealities from three areas based on the mechanism of charge domain computing: charge injection effect, temperature variations, and ADC reference voltage mismatch. By designing a 256 × 256 CIM macro and conducting investigations via post-layout simulation, we conclude that these nonidealities don’t deteriorate the computing linearity, but only cause the scaling and bias drift. To mitigate the scaling and bias drift identified, we propose three calibration methods ranging from the circuit level to the algorithm level, all of which exhibit promising results. The comprehensive analysis and calibration methods can assist in designing CIM macros with more accurate MAC computations, thereby supporting more robust deep learning inference.

Cite this article

HENG ZHANG , YICHUAN BAI , JUNJIE SHEN , YUAN DU , LI DU . Linearity Performance of Charge Domain In-Memory Computing: Analysis and Calibration[J]. Integrated Circuits and Systems, 2024 , 1(1) : 43 -53 . DOI: 10.23919/ICS.2024.3422968

[1]
G. Hinton et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.

[2]
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436-444, May 2015.

[3]
OpenAI et al., “GPT-4 technical report,” 2023, arXiv:2303.08774.

[4]
J. Lee, C. Kim, S. Kang, D. Shin, S. Kim, and H.-J. Yoo, “UNPU: An energy-efficient deep neural network accelerator with fully vari- able weight bit precision,” IEEE J. Solid-State Circuits, vol. 54, no. 1, pp. 173-185, Jan. 2019.

[5]
V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proc. IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017.

[6]
W. Li, A. Hu, N. Xu, and G. He, “Quantization and hardware archi- tecture co-design for matrix-vector multiplications of large language models,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 71, no. 6, pp. 2858-2871, Jun. 2024.

[7]
M. E. Sinangil et al., “A 7-nm compute-in-memory SRAM macro sup-porting multi-bit input, weight, and output and achieving 351 TOPS/W and 372.4 GOPS,” IEEE J. Solid-State Circuits, vol. 56, no. 1, pp. 188-198, Jan. 2021.

[8]
T. Finkbeiner, G. Hush, T. Larsen, P. Lea, J. Leidel, and T. Man- ning, “In-memory intelligence,” IEEE Micro, vol. 37, no. 4, pp. 30-38, Aug. 2017.

[9]
S. Yu, X. Sun, X. Peng, and S. Huang, “Compute-in-memory with emerging nonvolatile-memories: Challenges and prospects,” in Proc. IEEE Custom Integr. Circuits Conf., Mar. 2020, pp. 1-4.

[10]
Z. Chen, Q. Jin, J. Wang, Y. Wang, and K. Yang,“MC2-RAM:An in-8T-SRAM computing macro featuring multi-bit charge-domain com- puting and ADC-reduction weight encoding,” in Proc. IEEE/ACM Int. Symp. Low Power Electron. Des., Boston, MA, USA, 2021,pp.1-6.

[11]
J. Lee and H. -J. Yoo, “An overview of energy-efficient hardware ac- celerators for on-device deep-neural-network training,” IEEE Open J. Solid-State Circuits Soc., vol. 1, pp. 115-128, 2021.

[12]
J. Lee, S. Kang, J. Lee, D. Shin, D. Han, and H.-J. Yoo, “The hard- ware and algorithm co-design for energy-efficient DNN processor on edge/mobile devices,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 67, no. 10, pp. 3458-3470, Oct. 2020.

[13]
B. Keller et al., “A 95.6-TOPS/W deep learning inference accelerator with per-vector scaled 4-bit quantization in 5 nm,” IEEE J. Solid-State Circuits, vol. 58, no. 4, pp. 1129-1141, Apr. 2023.

[14]
B. Fleischer et al., “A scalable multi-TeraOPS deep learning processor core for AI training and inference,” in Proc. IEEE Symp. VLSI Circuits, 2018, pp. 35-36.

[15]
H. Zhang et al., “SSM-CIM: An efficient CIM macro featuring single- step multi-bit MAC computation for CNN edge inference,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 70, no. 11, pp. 4357-4368, Nov. 2023.

[16]
J.-M. Hung et al., “An 8-mb DC-current-free binary-to-8 b precision ReRAM nonvolatile computing-in-memory macro using time-space- readout with 1286.4- 21.6 TOPS/W for edge-AI devices,” in Proc. IEEE Int. Solid-State Circuits Conf., 2022, pp. 1-3.

[17]
S. Xie, C. Ni, A. Sayal, P. Jain, F. Hamzaoglu, and J. P. Kulka- rni, “eDRAM-CIM: Compute-in-memory design with reconfigurable embedded-dynamic-memory array realizing adaptive data converters and charge-domain computing,” in Proc. IEEE Int. Solid-State Circuits Conf., 2021, pp. 248-250.

[18]
Y.-C. Chiu et al., “A 22 nm 4 mb STT-MRAM data-encrypted near-memory computation macro with a 192GB/s read-and-decryption bandwidth and 25.1-055.1 TOPS/W 8 b MAC for AI operations,” in Proc. IEEE Int. Solid-State Circuits Conf., 2022, pp. 178-180.

[19]
J.-W. Su et al., “An 8-b-precision 6T SRAM computing-in-memory macro using segmented-bitline charge-sharing scheme for AI edge chips,” IEEE J. Solid-State Circuits, vol. 58, no. 3, pp. 877-892, Mar. 2023.

[20]
A. Biswas and A. P. Chandrakasan, “CONV-SRAM: An energy- efficient SRAM with in-memory dot-product computation for low- power convolutional neural networks,” IEEE J. Solid-State Circuits, vol. 54, no. 1, pp. 217-230, Jan. 2019.

[21]
X. Si et al., “A twin-8T SRAM Computation-in-memory unit-macro for multi-bit CNN-based AI edge processors,” IEEE J. Solid-State Circuits, vol. 55, no. 1, pp. 189-202, Jan. 2020.

[22]
S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,” IEEE J. Solid-State Circuits, vol. 55, no. 6, pp. 1733-1743, Jun. 2020.

[23]
Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: In-memory- com- puting SRAM macro based on capacitive-coupling computing,” IEEE Solid-State Circuits Lett, vol. 2, no. 9, pp. 131-134, Sep. 2019.

[24]
H. Kim, T. Yoo, T. T.-H. Kim, and B. Kim, “Colonnade: A reconfig- urable SRAM-based digital bit-serial compute-in-memory macro for processing neural networks,” IEEE J. Solid-State Circuits, vol. 56, no. 7, pp. 2221-2233, Jul. 2021.

[25]
Y.-D. Chih et al., “16.4 An 89TOPS/W and 16.3TOPS/mm2 all-digital SRAM-based full-precision compute-in memory macro in 22 nm for machine-learning edge applications,” in Proc. IEEE Int. Solid-State Circuits Conf., 2021, pp. 252-254.

[26]
H. Fujiwara et al., “A 5-nm 254-TOPS/W 221-TOPS/mm2 fully-digital computing-in-memory macro supporting wide-range dynamic-voltage- frequency scaling and simultaneous MAC and write operations,” in Proc. IEEE Int. Solid-State Circuits Conf., 2022, pp. 1-3.

[27]
H. Mori et al., “A 4 nm 6163-TOPS/W/b 4790-TOPS/mm2/b SRAM based digital-computing-in-memory macro supporting bit-width flexi- bility and simultaneous MAC and weight update,” in Proc. IEEE Int. Solid-State Circuits Conf., 2023, pp. 132-134.

[28]
D. Wang, C.-T. Lin, G. K. Chen, P. Knag, R. K. Krishnamurthy, and M. Seok, “DIMC: 2219TOPS/W 2569F2/b digital in-memory computing macro in 28 nm based on approximate arithmetic hardware,” in Proc. IEEE Int. Solid-State Circuits Conf., 2022, pp. 266-268.

[29]
F. Tu et al., “TranCIM: Full-digital bitline-transpose CIM-based sparse transformer accelerator with pipeline/parallel reconfigurable modes,” IEEE J. Solid-State Circuits, vol. 58, no. 6, pp. 1798- 1809, Jun. 2023.

[30]
M. Ali, S. Roy, U. Saxena, T. Sharma, A. Raghunathan, and K. Roy, “Compute-in-memory technologies and architectures for deep learning workloads,” IEEE Trans. Very Large Scale Integration Syst., vol. 30, no. 11, pp. 1615-1630, Nov. 2022.

[31]
A. Kneip and D. Bol, “Impact of analog non-idealities on the de- sign space of 6T-SRAM current-domain dot-product operators for in-memory computing,” IEEE Trans. Circuits Syst. I: Regular Papers, vol. 68, no. 5, pp. 1931-1944, May 2021.

[32]
H. Zhang, Y. Du, and L. Du, “Linearity analysis for charge domain in-memory computing,” in Proc. IEEE 15th Int. Conf. ASIC, Nanjing, China, 2023, pp. 1-4.

[33]
H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A mixed-signal binarized Convolutional-Neural-Network accelerator integrating dense weight storage and multiplication for reduced data movement,” in Proc. IEEE Symp. VLSI Circuits, 2018, pp. 141-142.

[34]
H. Jia et al., “Scalable and programmable neural network inference ac- celerator based on in-memory computing,” IEEE J. Solid-State Circuits, vol. 57, no. 1, pp. 198-211, Jan. 2022.

[35]
Y. Li, L. Du, and Y. Du, “A column-parallel time-interleaved SAR/SS ADC for computing in memory with 2-8bit reconfigurable resolution,” in Proc. IEEE 5th Int. Conf. Artif. Intell. Circuits Syst., Hangzhou, China, 2023, pp. 1-5.

[36]
C.-C. Liu, S.-J. Chang, G.-Y. Huang, and Y.-Z. Lin, “A 10-bit 50-MS/s SAR ADC with a monotonic capacitor switching procedure,” IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 731-740, Apr. 2010.

[37]
Z. Chen et al., “CAP-RAM: A charge-domain in-memory computing 6T-SRAM for accurate and precision-programmable CNN inference,” IEEE J. Solid-State Circuits, vol. 56, no. 6, pp. 1924-1935, Jun. 2021.

[38]
H. Wang, R. Liu, R. Dorrance, D. Dasalukunte, D. Lake, and B. Carl- ton, “A charge domain SRAM compute-in-memory macro with C-2C ladder-based 8-bit MAC unit in 22-nm FinFET process for edge in- ference,” IEEE J. Solid-State Circuits, vol. 58, no. 4, pp. 1037-1050, Apr. 2023.

[39]
Y. Bai, Y. Li, H. Zhang, A. Jiang, Y. Du, and L. Du, “A com- pilation framework for SRAM computing-in-memory systems with optimized weight mapping and error correction,” IEEE Trans. Comput.- Aided Des. Integr. Circuits Syst., early access, Feb. 14, 2024, doi: 10.1109/TCAD.2024.3366025.

Outlines

/