Integrated Circuits and Systems >
Edge-Optimized AI Architecture: MRAM-Based Near Memory Computing Macro Balancing Between Memory Capacity and Computation
|
XI YANG (Student Member, IEEE); |
|
LIANG CHANG(Member, IEEE); |
|
SHOUZHONG PENG(Member, IEEE); |
|
JUN ZHOU (Senior Member, IEEE) |
Received date: 2025-01-10
Revised date: 2025-03-11
Accepted date: 2025-03-16
Online published: 2025-10-22
Supported by
Open Project Program of Anhui Province Key Laboratory of Spintronic Chip Research and Manufacturing under Grant WNKFKT-25-01, in part by the National Science Foundation of China under Grant 62104025, and in part by the State Key Laboratory of Computer Architecture (ICT, CAS) under Grant CLQ202305
General-purpose edge neural networks need a lightweight architecture that effectively balances storage and computing resources. However, SRAM-based computing-in-memory (CIM) architectures face challenges in delivering adequate on-chip storage while fulfilling computing requirements. To overcome this, we introduce a new MRAM-based near-memory computing (NMC) architecture. It retains the costeffective data access benefits of CIM while separating storage and computing at the macro-level, improving deployment adaptability. We refine the NMC macro by incorporating small temporary storage and adopting a layer-fusion approach to enhance data-transfer efficiency. By integrating a high-capacity MRAM into the macro, we attain a storage density of 0.532 um2/bit. Moreover, we enhance the adder tree with a shift module, supporting multiply-and-accumulate (MAC) operations at five distinct depths (8, 9, 16, 32, and 64), which raises resource utilization efficiency to 88.3%. Our architecture achieves an on-chip storage density of 1.49 Mb/mm2 and an energy efficiency of 6.164 TOPS/W.
XI YANG , YAMIN MAO , LIANG CHANG , HAOJIE WEI , YUANBO WANG , JINGKE WANG , CHAO FAN , ZHONGMOU WU , SHOUZHONG PENG , JUN ZHOU . Edge-Optimized AI Architecture: MRAM-Based Near Memory Computing Macro Balancing Between Memory Capacity and Computation[J]. Integrated Circuits and Systems, 2025 , 2(1) : 4 -12 . DOI: 10.23919/ICS.2025.3553460
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
|
| [25] |
|
| [26] |
|
| [27] |
|
| [28] |
|
| [29] |
|
| [30] |
|
| [31] |
|
| [32] |
|
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
/
| 〈 |
|
〉 |