Most Viewed

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Special Issue on Selected Papers from ICTA2023
    SHIJIE LI, RUICHANG MA, MINGXING DENG, JIAMIN XUE, WEI DENG, BAOYONG CHI, HAIKUN JIA
    Integrated Circuits and Systems. 2024, 1(2): 109-118. https://doi.org/10.23919/ICS.2024.3423852

    This paper presents a 32 Gbps wireline transceiver that not only supports the JESD204 C standard but also maintains back-compatibility with JESD204B with minimal additional circuitry. Additionally, a pattern-filtered phase detector (PFPD) is proposed to circumvent the side effect of ambiguous sampling clock phase caused by loop-unrolled 1st post-cursor tap equalization scheme in the decision-feedback equalization (DFE). A 16 GHz external half-rate clock is injected into an on-chip injection-locked ring oscillator to distribute the 16 GHz clock for both the receiver and the transmitter. Multiple on-chip adaption engines and calibration loops are also added to assist the whole system work properly, such as tap weight and desired level adaption engine integrated into the decision-feedback equalizer, duty cycle distortion correction and IQ-mismatch correction. Fabricated in 28 nm CMOS process, the proposed transceiver demonstrates its ability to operate within a signaling range from 312.5 Mbps to 32 Gbps, achieving a BER below 10−12 over a 14.9 dB channel loss at Nyquist frequency. It occupies an aggregated area of 1.4 mm2 and consumes 203 mW at 32 Gbps, in which 50 mW for the transmitter (TX) and 153 mW for the receiver (RX), therefore end up achieving 6.34pJ/bit power efficiency at 32 Gbps.

  • Special Issue on Selected Papers from ICTA2023
    WEIYI ZHANG, CHAOYANG DING, XIAORUI MO, FEI SHAO, YIYANG WANG, YUSHI GUO, LITING NIU, CHENG NIAN, FASIH UD DIN FARRUKH, CHUN ZHANG
    Integrated Circuits and Systems. 2024, 1(2): 66-79. https://doi.org/10.23919/ICS.2024.3449791

    Simultaneous Localization and Mapping (SLAM) is the process by which a mobile robot can build a map of the surrounding environment and compute its own location. Feature point extraction is one of the key components of a SLAM system. The extraction accuracy and efficiency of corner detection directly affect the overall accuracy and throughput of the system. However, the complexity of corner detection algorithms makes it challenging to achieve real-time implementation and efficient, low-cost hardware design, especially for mobile robots. Harris corner detection class algorithms including Harris and GFTT (Good Feature to Track) have improved accuracy. However, those algorithms require high resource consumption and latency when implemented on hardware platforms. The GFTT achieves higher accuracy than Harris while requiring higher computational complexity. To address the throughput problem, SFTT (Simple Feature to Track), a new Harris class detection algorithm is proposed, and the corresponding hardware accelerator is designed. The proposed SFTT significantly reduced the computational complexity compared with the Harris algorithm and GFTT. Experiments have shown SFTT also achieved slightly higher accuracy compared with the two algorithms. Furthermore, the GFTT accelerator is designed which reaches up to 325 fps at the frequency of 100 MHz. The proposed design has achieved an improvement in throughput by 1.3× times and power efficiency by 1.7× times as compared to state-of-the-art design.

  • SHUNQIN CAI, LIUKAI XU, DENGFENG WANG, ZHI LI, WEIKANG QIAN, LIANG CHANG, YANAN SUN
    Integrated Circuits and Systems. 2024, 1(2): 80-91. https://doi.org/10.23919/ICS.2024.3419630

    SRAM-based computing-in-memory (SRAM-CIM) is expected to solve the “Memory Wall” problem. For the digital domain SRAM-CIM, full-precision digital logic has been utilized to achieve high computational accuracy. However, the energy and area efficiency advantages of CIM cannot be fully utilized under error-resilient neural networks (NNs) with given quantization bit-width. Therefore, an all-digital Bit-wise Approximate compressor configurable In-SRAM-computing macro for Energy-efficient NN acceleration, with a data-aware weight Remapping method (BASER), is proposed in this paper. Leveraging the NN error resilience property, six energy-efficient bit-wise compressor configurations are presented under 4b/4b and 3b/3b NN quantization, respectively. Concurrently, a data-aware weight remapping approach is proposed to enhance the NN accuracy without supplementary retraining further. Evaluations of VGG-9 and ResNet-18 on CIFAR-10 and CIFAR-100 datasets show that the proposed BASER achieves 1.35x and 1.29x improvement in energy efficiency, as well as limited accuracy loss and improved NN accuracy, as compared to the previous full-precision and approximate SRAM-CIM design, respectively.

  • Regular Papers
    FUPING LI, YING WANG, MEIXUAN LU, YUTONG ZHU, HAORAN WANG, ZHUN ZHAO, JUNPEI HUANG, XIAOTONG WEI, XIHAO LIANG, YUJIE WANG, HAOBO XU, HUAWEI LI, XIAOWEI LI, QI LIU, MING LIU, NINGHUI SUN, YINHE HAN
    Integrated Circuits and Systems. 2024, 1(1): 18-30. https://doi.org/10.23919/ICS.2024.3451428

    Due to the waning of Moore’s Law, the conventional monolithic chip architectural design is confronting hurdles such as increasing die size and skyrocketing cost. In this post-Moore era, the integrated chip has emerged as a pivotal technology, gaining substantial interest from both the academia and industry. Compared with monolithic chips, the chiplet-based integrated chips can significantly enhance system scalability, curtail costs, and accelerate design cycles. However, integrated chips introduce vast design spaces encompassing chiplets, inter-chiplet connections, and packaging parameters, thereby amplifying the complexity of the design process. This paper introduces the Optimal Decomposition-Combination Theory, a novel methodology to guide the decomposition and combination processes in integrated chip design. Furthermore, it offers a thorough examination of existing integrated chip design methodologies to showcase the application of this theory.

  • Special Issue on Selected Papers from ICTA2023
    GUOQING WANG, ZHAO ZHANG
    Integrated Circuits and Systems. 2024, 1(2): 103-108. https://doi.org/10.23919/ICS.2024.3456043

    This work presents a PAM4 receiver analog frontend (AFE) operating up to 64 Gb/s. The electronic integrated circuit (EIC) is fabricated in 40-nm CMOS technology. This AFE is composed of a single-stage Continuous-Time Linear Equalizer (CTLE), a Variable Gain Amplifier (VGA), an input impedance matching network, a buffer stage, and an output buffer. The single-stage triple-peaking CTLE proposed employs current reuse technique and a multi-feedback structure, enabling the adjustment of peaking in the low, mid, and high-frequency bands. Thus, only one-stage CTLE is sufficient to achieve an over-20-dB boost at Nyquist frequency to save power. The VGA adopts an enhanced structure based on the Gilbert cell, where the gain is manipulated by controlling the gate voltage of MOS transistors. The CTLE undergoes variations in its DC gain during the adjustment process to equalize channel losses. The role of the VGA is to stable the DC gain changes induced by the adjustment of the CTLE. The output buffer adopts two stages, aiming to ensure that the gain does not attenuate excessively while maintaining output impedance matching. The AFE consumes 21.1 mW with a supply voltage of 1.5/1 V. It can provide a maximum boost of 22.5 dB, and the data rate reaches up to 64 Gb/s. Additionally, it features peaking adjustment capabilities in the low, mid, and high-frequency bands. Finally, the measurement demonstrates its ability to effectively equalize a channel with a 12-dB loss at the Nyquist frequency of 16 GHz.

  • Special Section on Selected Papers from ASICON2023
    XIANGCHEN WAN, SIQING WU, XINWEI YU, XINGTAO ZHU, AND FAN YE
    Integrated Circuits and Systems. 2024, 1(1): 33-42. https://doi.org/10.23919/ICS.2024.3422708

    This paper presents an AC coupling ultrasound analog front-end (AFE) architecture with a three-stage DC offset correction (DCOC) circuit. In ultrasound systems, the low noise amplifier (LNA), time gain control (TGC), and low pass filter (LPF) constitute the AFE, which achieves low noise, time-varying gain compensation, and filtering for the received ultrasound signal. The inherent asymmetry in LNA, layout asymmetry and the process variation introduce DC offset and the TGC changes it into low-frequency offset drift. The proposed DCOC circuit for LNA is composed of a transconductance amplifier and an off-chip capacitor, while a fully differential operational amplifier and a pseudo resistor are used for other amplification stages. The AC coupling scheme is also used to reduce the offset and drift. The simulation result shows when the DCOC and the AC coupling are adopted, the offset and drift are almost perfectly suppressed. The proposed AFE has been fabricated by a 28-nm CMOS process, and it achieves an 85 dB gain range with low input-referred noise of 2.43 nV/ $\sqrt{Hz}$ at 5 MHz, and it also has a tunable bandwidth of 15/30 MHz and switchable input impedance of 50/100 ohms.

  • Special Issue on Selected Papers from ICTA2023
    JUNYAN SUN, XUEFEI BAI
    Integrated Circuits and Systems. 2024, 1(2): 92-102. https://doi.org/10.23919/ICS.2024.3419562

    CRYSTALS-Kyber has emerged as a notable lattice-based post-quantum cryptography (PQC) scheme. As one of the four finalists in NIST’s PQC standardization round three, CRYSTALS-Kyber is the only encryption algorithm demonstrating superior performance compared to other algorithms. The number theoretic transform (NTT) is employed to optimize polynomial multiplication, which constitutes the most complex operation within CRYSTALS-Kyber. This study introduces a high-speed NTT accelerator architecture, featuring a novel butterfly unit and an efficient modular polynomial multiplier. The proposed accelerator utilizes a radix-4-based configurable NTT design, which is capable of executing both forward and inverse NTT operations on a unified architecture. When implemented on the Xilinx Virtex-7 FPGA platform, the proposed architecture achieves an acceleration of 1.02-2.30 times in terms of latency, a throughput improvement of 1.02-2.30 times, and an area throughput improvement of up to 3.30 times, relative to the prior works.

  • Regular Papers
    SHAO-CHUN HUNG, PARTHO BHOUMIK, ARJUN CHAUDHURI, SANMITRA BANERJEE, KRISHNENDU CHAKRABARTY
    Integrated Circuits and Systems. 2024, 1(1): 3-17. https://doi.org/10.23919/ICS.2024.3419629

    As Moore’s Law approaches its limits, 3-D integrated circuits (ICs) have emerged as promising alternatives to conventional scaling methodologies. However, the benefits of 3-D integration in terms of lower power consumption, higher performance, and reduced area are accompanied by testing challenges. The unique vertical stacking of components in 3-D ICs introduces concerns related to the robustness of bonding surfaces. Moreover, immature manufacturing processes during 3-D fabrication can lead to high defect rates in different tiers. Therefore, there is a need for design-for-test solutions to ensure the reliability and performance of 3-D-integrated architectures. In this paper, we provide a comprehensive survey of existing testing strategies for 3-D ICs. We describe recent advances, including research efforts and industry practice, that address concerns related to bonding defects, elevated power supply noise, fault diagnosis, and fault localization specific to the unique characteristics of 3-D ICs.

  • Special Section on Selected Papers from ASICON2023
    FRANCOIS RIVET,, LIANG QI
    Integrated Circuits and Systems. 2024, 1(1): 31-32. https://doi.org/10.23919/ICS.2024.3484397
  • Special Issue on Selected Papers from ICTA2023
    XIAOYAN GUI, LIN CHENG
    Integrated Circuits and Systems. 2024, 1(2): 64-65. https://doi.org/10.23919/ICS.2024.3483732
  • Special Section on Selected Papers from ASICON2023
    JIAXIANG LI, MASAO YANAGISAWA, YoUHUA SH
    Integrated Circuits and Systems. 2024, 1(1): 53-62. https://doi.org/10.23919/ICS.2024.3423850

    The large-scale neural networks have brought incredible shocks to the world, changing people's lives and offering vast prospects. However, they also come with enormous demands for computational power and storage pressure, the core of its computational requirements lies in the matrix multiplication units dominated by multiplication operations. To address this issue, we propose an area-power-efficient multiplier-less processing element (PE) design. Prior to implementing the proposed PE, we apply a power-of-2 dictionary-based quantization to the model and effectiveness of this quantization method in preserving the accuracy of the original model is confirmed. In hardware design, we present a standard and one variant ‘bi-sign’ architecture of the PE. Our evaluation results demonstrate that the systolic array that implement our standard multiplier-less PE achieves approximately 38% lower power-delay-product and 13% smaller core area compared to a conventional multiplication-and-accumulation PE and the bi-sign PE design can even save 37% core area and 38% computation energy. Furthermore, the applied quantization reduces the model size and operand bit-width, leading to decreased on-chip memory usage and energy consumption for memory accesses. Additionally, the hardware schematic facilitates expansion to support other sparsity-aware, energy-efficient techniques.

  • Special Section on Selected Papers from ASICON2023
    HENG ZHANG, YICHUAN BAI, JUNJIE SHEN, YUAN DU, LI DU
    Integrated Circuits and Systems. 2024, 1(1): 43-53. https://doi.org/10.23919/ICS.2024.3422968

    Deep learning has recently gained significant prominence in various real-world applications such as image recognition, natural language processing, and autonomous vehicles. While deep neural networks appear to have different architectures, the main operations within these models are matrix-vector multiplications (MVM). Compute-in-memory (CIM) architectures are promising solutions for accelerating the massive MVM operations by alleviating the frequent data movement issue in traditional processors. Ana log CIM macros leverage current-accumulating or charge-sharing mechanisms to perform multiply-and-add (MAC) computations. Even though they can achieve high throughput and efficiency, the computing accuracy is sacrificed due to the analog nonidealities. To ensure precise MAC calculations, it is crucial to analyze the sources of nonidealities and identify their impacts, along with corresponding solutions. In this paper, comprehensive linearity analysis and dedicated calibration methods for charge domain static-random access memory (SRAM) based in-memory computing circuits are proposed. We analyze nonidealities from three areas based on the mechanism of charge domain computing: charge injection effect, temperature variations, and ADC reference voltage mismatch. By designing a 256 × 256 CIM macro and conducting investigations via post-layout simulation, we conclude that these nonidealities don’t deteriorate the computing linearity, but only cause the scaling and bias drift. To mitigate the scaling and bias drift identified, we propose three calibration methods ranging from the circuit level to the algorithm level, all of which exhibit promising results. The comprehensive analysis and calibration methods can assist in designing CIM macros with more accurate MAC computations, thereby supporting more robust deep learning inference.

  • Editorial
    Ming Liu, PETER LIAN
    Integrated Circuits and Systems. 2024, 1(1): 2-2. https://doi.org/10.23919/ICS.2024.3483733
  • Regular Papers
    SHENGNAN ZHANG, YIFAN ZHAO, XINGLONG YU, JUN HAN
    Integrated Circuits and Systems. 2025, 2(3): 149-157. https://doi.org/10.23919/ICS.2025.3579338

    SPHINCS+ is a hash-based digital signature scheme that has been selected for post-quantum cryptography(PQC) standardization announced by the U.S. National Institute of Standards and Technology (NIST) in 2022. Although SPHINCS+ offers significant security against quantum attacks, its relatively slow computation times present a major obstacle to its practical deployment. To address this challenge, improving the computational efficiency of SPHINCS+ becomes a critical task. The cryptographic operations in SPHINCS+ rely on tweakable hash functions, with various hash algorithms available for selection. Among these, SHA-3 stands out as a widely adopted and NIST-standardized hash function, making it a preferred choice for implementation in SPHINCS+. In this work, we propose a dedicated coprocessor that integrates a SHA-3 accelerator along with its associated peripheral structure. This coprocessor is designed to extend the RISC-V instruction set by incorporating seven custom instructions, enabling efficient software-hardware co-acceleration. Furthermore, we investigate the parallelizable components within SPHINCS+, specifically the FORS and WOTS+ Algorithms, to identify means for optimization. By leveraging thread-level parallelism through multi-core programming, we achieve significant improvements in performance. To validate the design, synthesis is performed using TSMC 28-nm CMOS technology at 800 MHz. Compared to the benchmark results from the ARM Cortex-M4 processor, our approach achieves an impressive 23.1× speedup in the overall single-core performance of SPHINCS+, with an additional 3.4× speedup for the verification process by utilizing multi-core acceleration.

  • Regular Papers
    JUNZHAN LIU, JINYAO MI, YANG LIU, LIANG ZHANG, HE ZHANG, WANG KANG
    Integrated Circuits and Systems. 2025, 2(3): 102-109. https://doi.org/10.23919/ICS.2025.3567939

    Computing-in-memory (CIM) offers a promising solution to the memory wall issue. Magnetoresistive random-access memory (MRAM) is a favored medium for CIM due to its non-volatility, high speed, low power, and technology maturity. However, MRAM has continuously encountered the challenge of an insufficient high-resistance state (HRS) to low-resistance state (LRS) ratio, which affects the result accuracy of CIM. In this paper, based on SOT devices, we propose a 5T2M bit-cell structure that increases the high-to-low current ratio by modulating the sub-threshold operation region. Besides, by jointly using high-resistance devices (M_ level), the power consumption of the bit-cell array can be significantly reduced. Simultaneously, we have designed a compatible multi-bit implementation and macro architecture to support AI edge inference acceleration. This work was simulated under a 40-nm foundry process and a physically verified SOT-MTJ model. The results show that under the same high-to-low resistance ratio, a 52.6× high-to-low current ratio can be achieved, along with a 38.6%-98% bit-cell array power reduction.

  • Co-Optimization for Large Language Models: Advances in Algorithm and Hardware
    CHENG ZHANG, XINGYU ZHU, LONGHAO CHEN, TINGJIE YANG, EVENS PAN, GUOSHENG YU, YANG ZHAO, XIGUANG WU, BO LI, WEI MAO, GENQUAN HAN
    Integrated Circuits and Systems. 2025, 2(2): 49-57. https://doi.org/10.23919/ICS.2025.3568404

    Large language models (LLMs) have exhibited remarkable performance across a broad spectrum of tasks, yet their extensive computational and memory requirements present substantial challenges for deployment in resource-constrained scenarios. To address the challenges, this work introduces software and hardware co-optimization strategies aimed at enhancing the inference performance of LLMs on ARM CPU-based platforms. A mixed-precision quantization technique is employed, preserving the precision of critical weights to maintain model accuracy while quantizing non-essential weights to INT8, thereby reducing the model’s memory footprint. This work also capitalizes on the SIMD instruction set of ARM CPUs to efficiently process model data. Furthermore, the inference framework is optimized by fusing components of the attention computation and streamlining the dequantization process through modifications to the scaling factor. These enhancements result in a significant reduction in model memory usage and improved throughput during the prefill and decode stages. The efficacy of the proposed approach is demonstrated through the optimization of the Qwen-1.8B model on Armv9, with only a 0.66% decrease in accuracy and a reduction in memory usage to 58.8% of the baseline, while achieving a 4.09× and 15.23× increase in inference performance for the prefill and decode stages over the baseline, respectively.

  • Regular Papers
    SHUYI XIANG, KA’NAN WANG, RENJIE TANG, YUKUN HE, ZHENGYANG YE, XI’AN CHEN, XIAOYAN GUI
    Integrated Circuits and Systems. 2025, 2(2): 93-98. https://doi.org/10.23919/ICS.2025.3564576

    This paper presents a 130 GBaud four-to-one analog multiplexer (AMUX) with four-level pulse-amplitude modulation (PAM-4) in a 130-nm SiGe BiCMOS process. The architecture comprises two stages of the two-to-one AMUX. The four quarter-rate signals are fed into the first-stage AMUX circuit after equalization by continuous-time linear equalizers (CTLE) to produce two-way half-rate signals through time interleaving. The AMUX core circuit of the second stage is based on the Gilbert cell. Compared to the conventional sampling method where the clock signal is centered within 1UI of the data signal, the secondstage AMUX in this design aligns the rising edge of the clock signal with the transition edge of the data signal during sampling. This approach avoids the idle dummy branches in the conventional design, thereby significantly improving the energy efficiency. The AMUX generates two full-rate data signals spaced by 1-UI for subsequent feed-forward equalization (FFE). A two-tap FFE is designed with the transconductance (Gm) cell to compensate for the channel loss. As for the clock chain, the half-rate clock is provided by an external high speed clock source. It will pass through a voltage-controlled delay line (VCDL) to regulate the timing relationship between the clock and data signals in the second stage. And the two-way quarter-rate clocks in quadrature phases need to be generated from the half-rate clock for the two AMUXs in the first stage. Finally, a 130 GBaud PAM-4 signal is generated with a power consumption of 1 W.

  • Co-Optimization for Large Language Models: Advances in Algorithm and Hardware
    ZHE WEN, LIANG XU, MEIQIWANG
    Integrated Circuits and Systems. 2025, 2(2): 58-66. https://doi.org/10.23919/ICS.2025.3575371

    In recent years, the exponential growth in Large Language Model (LLM) parameter sizes has significantly increased computational complexity, with inference latency emerging as a prominent challenge. The primary bottleneck lies in the token-by-token prediction process during autoregressive decoding, resulting in substantial delays. Therefore, enhancing decoding efficiency while maintaining accuracy has become a critical research objective. This paper proposes an Adaptive Parallel Layer-Skipping Speculative Decoding (APLS) method, which leverages speculative decoding techniques by employing a Small-Scale Model (SSM) for preliminary inference and validating the predictions using the original LLM. This approach effectively balances the high precision of LLMs with the efficiency of SSMs. Notably, our SSM does not require additional training but is instead derived through a simplification of the original large-scale model. By incorporating parallelization and a layer-skipping structure, the inference process dynamically bypasses certain redundant transformation layers, significantly improving GPU utilization and inference speed without compromising performance. Furthermore, to address challenges such as window size limitations and memory fragmentation in long-text processing, this paper introduces progressive layer reduction and key-value cache deletion techniques to further optimize the performance of SSMs. Experimental results demonstrate that the proposed method achieves a 2.51 × improvement in efficiency during autoregressive decoding. As this approach eliminates the need for additional training of SSM, it offers a significant competitive advantage in high-cost model compression environments.

  • Co-Optimization for Large Language Models: Advances in Algorithm and Hardware
    SHAOBO LUO, ALBERT YU, ZHIYUAN XIE, HONG HUANG, MINGQIANG HUANG, KAI LI, YUK KAN PUN, ZHIRU GUO, SHUWEI LI, YIMING ZHU, CHANGHAI MAN, HUIYUAN SUN, TUNG-HAN CHANG, ZIYI GUAN, QIYUAN ZHANG, TINGTING WANG, GUANQI PENG, WENJUN CHEN, YAN SUN, GENGXIN CHEN, MEI YAN, HAO YU
    Integrated Circuits and Systems. 2025, 2(2): 67-80. https://doi.org/10.23919/ICS.2025.3552542

    Precision medicine is revolutionizing global healthcare by enabling personalized diagnostics, disease prediction, and tailored treatment strategies. While the integration of genomics and data science holds immense potential to optimize precision therapeutic outcomes, a critical challenge lies in translating gene sequencing data into actionable insights for in vitro diagnostics. This bottleneck is largely attributed to the limitations of edge-side intelligent processing and automation. Despite advancements in gene sequencing technologies and bioinformatics tools, the workflow from sample collection to diagnostic report generation remains fragmented, inefficient, and lacks of intelligence. To address these challenges, we introduce an embodied LLM NGS sequencer on the edge for real-time, on-site smart genetic diagnostics. This instrument integrates a streamlined and comprehensive pipeline with deep learning networks for primary data analysis, machine learning for secondary data processing, and a large language model (LLM) optimized for tertiary data interpretation. The LLM is enhanced through quantization and compression, facilitating deployment on FPGA/GPU to accelerate diagnostic workflows. Experimental results showcased the superior performance by achieving a 13.72% increase in throughput, a 99.50% Q30%, and enable smart diagnostic on the edge with the performance up to 75 tokens/s. This work enables immediate, on-site DNA analysis, hence dramatically improving precision medicine’s accessibility and efficiency, and significantly advances diagnostic accuracy, automation, establishing a robust platform for AI-driven personalized medicine and setting a new benchmark for the future of healthcare delivery.

  • Regular Papers
    SHUAI XIAO, FUYI LI, TING HAO, LANXIANG XIAO, MANLIN XIAO, WEI MAO, GENQUAN HAN
    Integrated Circuits and Systems. 2025, 2(2): 81-92. https://doi.org/10.23919/ICS.2025.3571019

    Computing-in-Memory (CIM) architectures have emerged as a pivotal technology for nextgeneration artificial intelligence (AI) and edge computing applications. By enabling computations directly within memory cells, CIM architectures effectively minimize data movement and significantly enhance energy efficiency. In the CIM system, the analog-to-digital converter (ADC) bridges the gap between efficient analog computation and general digital processing, while influencing the overall accuracy, speed and energy efficiency of the system. This review presents theoretical analyses and practical case studies on the performance requirements of ADCs and their optimization methods in CIM systems, aiming to provide ideas and references for the design and optimization of CIM systems. The review comprehensively explores the relationship between the design of CIM architectures and ADC optimization, and raises the issue of design trade-offs between low power consumption, high speed operation and compact integration design. On this basis, novel customized ADC optimization methods are discussed in depth, and a large number of current CIM systems and their ADC optimization examples are reviewed, with optimization methods summarized and classified in terms of power consumption, speed, and area. In the final part, this review analyzes energy efficiency, ENOB, and frequency scaling trends, demonstrating how advanced processes enable ADCs to balance speed, power, and area trade-offs, guiding ADC optimization for next-gen CIM systems.

  • Regular Papers
    DAYAN ZHOU, YUGUO XIANG, FAN YE
    Integrated Circuits and Systems. 2025, 2(3): 131-138. https://doi.org/10.23919/ICS.2025.3571821

    This paper presents a fully digital foreground calibration method for pipeline-SAR analog-todigital converters (ADCs) using sine-fit based on the Extended Kalman Filter (EKF). The sine-fit technique provides a reference output, while an adaptive Least Mean Square (LMS) algorithm iteratively adjusts the reconstruction weights to correct mismatches and nonlinearities. The EKF significantly reduces hardware complexity by enabling real-time estimation without requiring extensive data storage. A modeled 12-bit pipeline-SAR ADC is used to evaluate the method’s effectiveness. Simulation results demonstrate that the proposed calibration scheme improves the spurious-free dynamic range (SFDR) and signal-to-noise-anddistortion ratio (SNDR) by 33.6 dB and 18.8 dB, respectively.

  • Regular Papers
    FEI LIU, LANGYUAN WANG, SHUYU ZHANG, HANLU ZHANG, NA YAN
    Integrated Circuits and Systems. 2025, 2(3): 110-121. https://doi.org/10.23919/ICS.2025.3582894

    This paper presents a single-inductor-multiple-output (SIMO) buck/boost/buck-boost converter for wearable electronic devices.Aiming at high light-load efficiency and low ripple, the converter applies fully asynchronous burst mode control. The circuit enters sleep mode intermittently during light loads, significantly reducing static power consumption. The peak inductor current is fixed, effectively limiting the maximum output ripple. The converter features three conversion modes: buck, boost, and auto-gain buck-boost. DC analysis is conducted to derive expressions for output ripple and maximum load in relation to the peak inductor current. AC stability analysis is performed with small signal perturbation and linearization methods, proving the stability of all three modes. Measured results indicate that the converter achieves a peak efficiency of 91.0% at an output power of 77.5 mW. The maximum output ripple is 27.0 mV, and the overshoot or undershoot during load transients is not observed. Compared with existing converters, it exhibits higher efficiency and lower ripple, along with a fast load transient response, offering a highly efficient power management solution for wearable devices.

  • Regular Papers
    LINA WANG, JIANZHENG LI, WEIMIN HU, YAJIE QIN
    Integrated Circuits and Systems. 2025, 2(3): 122-130. https://doi.org/10.23919/ICS.2025.3569486

    This paper presents a highly integrated wearable electrochemical sensor chip for sweat monitoring, incorporating both a current readout circuit and a programmable excitation waveform generator circuit. The chip is fabricated using a 0.11 μm standard CMOS process. The design utilizes a high-resolution and wide dynamic range current readout circuit for multimodality electrochemical sensing. A bidirectional current sensing potentiostat, based on a cascode current mirror, is presented. The circuit achieves bidirectional current sensing while isolating the sensing electrode from the subsequent circuitry, enhancing its versatility for various electrochemical measurement techniques. Additionally, the implementation of a current feedback loop, in conjunction with an automatic amplitude control method and a current-mode digital-to-analog converter, not only extends the dynamic range of the input current but also effectively eliminates the background currents. This design achieves 101 dB current dynamic range and 123 pA current resolution in the detection current range of ±15 μA with an R2 linearity of 0.9999. It also attains a nonlinearity of 0.07%, ensuring minimal distortion. The current readout circuit consumes 12 μA of static current from a 1.5 V supply.

  • Special Issue Papers
    YAN ZHOU, JUN WANG
    Integrated Circuits and Systems. 2025, 2(1): 28-35. https://doi.org/10.23919/ICS.2025.3547674

    This study presents a comprehensive thermal stress analysis of critical components in an embedded multi-die interconnect bridge (EMIB) within a chiplet package using finite element analysis (FEA). We systematically evaluated key design parameters—including bump diameter-to-pitch ratios, bump distribution patterns, EMIB thickness, number of EMIBs, and aspect ratios—to assess their impact on stresses. An ABAQUS-based FEA model was used to simulate thermal loading with a 165 °C temperature increase. The results indicate that a bump diameter-to-pitch ratio of 0.3 optimizes stress distribution, while a peripheral bump arrangement is superior in stress reduction compared to other patterns. Thinner EMIBs linearly reduce maximum principal stress, whereas multiple EMIBs and aspect ratio variations have minimal effects. These findings offer practical guidelines for optimizing EMIB design in chiplet packages, emphasizing the importance of bump geometry, distribution patterns, and EMIB thickness for improved reliability.

  • Co-Optimization for Large Language Models: Advances in Algorithm and Hardware
    ZHONGFENG WANG
    Integrated Circuits and Systems. 2025, 2(2): 47-48. https://doi.org/10.23919/ICS.2025.3577274
  • Regular Papers
    TAO ZHONG, YUEKANG GUO, JING JIN, JIANJUN ZHOU
    Integrated Circuits and Systems. 2025, 2(3): 139-148. https://doi.org/10.23919/ICS.2025.3563318

    The evolution of 5G and beyond wireless networks has intensified the demand for millimeterwave technology to support high-throughput applications. This paper introduces a novel energy-efficient digital beamforming receiver architecture that integrates multi-stage noise-shaping (MASH) delta-sigma modulators (DSMs) with bit-stream processing (BSP), effectively addressing the significant propagation losses and dynamic electromagnetic interference associated with millimeter-wave (mm-wave) systems. The novel architecture achieves enhanced dynamic range without increasing signal bit-width, thereby ensuring low power consumption and a compact design. Unlike traditional analog and hybrid beamforming methods, the proposed approach utilizes digital-domain processing for precise beamforming, simplified local oscillator networks, and improved integration. System-level simulations with a 9-antenna beamforming receiver array demonstrate the architecture’s capability for accurate beamforming across angles from 30° to 150° and effective dual-target detection. Furthermore, the P2S-BSP architecture reduces digital circuitry area by 50% compared to previous implementations while maintaining energy efficiency. These advancements highlight the proposed architecture as a scalable solution for future mm-wave applications, including intelligent transportation systems, radar, and high-density mobile networks.

  • Regular Papers
    FENGSHUO TIAN, KAIXUAN WANG, JUN HAN
    Integrated Circuits and Systems. 2025, 2(3): 167-173. https://doi.org/10.23919/ICS.2025.3583689

    Control flow integrity (CFI) plays an important role in defending against code reuse attacks (CRA). It protects the program’s control flow from being hijacked by restricting control flow transfers during execution. Specifically, backward-edge CFI safeguards return addresses to mitigate Return-Oriented Programming (ROP) attacks. In this work, we implement a backward-edge CFI mechanism that employs the Advanced Encryption Standard (AES) for cryptographic protection of return addresses. We utilize the gem5 simulator for architectural modeling and evaluation. Additionally, we design a dedicated AES hardware accelerator and integrate it into the system through gem5+RTL co-simulation. The AES accelerator is synthesized under TSMC 28 nm technology, which can work at 1GHz, with an area of 10045 μm2 and a power consumption of 1.31 mW. Experimental results indicate that the performance overhead of the backward-edge CFI scheme is less than 0.1%.

  • Special Issue Papers
    KUN HE, HUANPENG WANG, YUE TANG, JIE LIU, MUSHENG LIANG, YUEHANG XU
    Integrated Circuits and Systems. 2025, 2(1): 36-45. https://doi.org/10.23919/ICS.2025.3547670

    Current thermoelectric models under electrical stress often neglect the critical impact of crack formation, limiting their predictive accuracy for solder ball reliability. To study the impact of cracks under electrical stress conditions, this study designed an electrical stress-induced failure experiment, applying stepwise current loading to the devices under tests (DUTs) to obtain resistance-time curves, with computed tomography (CT) revealing cracks in the solder balls. Based on these experimental results, a thermoelectric coupling model was developed to predict the temperature-resistance relationship of heterogeneous interconnect structures, incorporating crack factors observed during the experiment. The thermoelectric coupling model demonstrated high accuracy, achieving a maximum error of less than 2.5%. By incorporating the effects of crack formation under high electrical stress, the model provides precise predictions of solder ball resistance evolution.

  • Special Issue Papers
    YANG ZHANG, LONGYUAN KANG, XIANGRUI WANG, ENYI YAO
    Integrated Circuits and Systems. 2025, 2(1): 22-27. https://doi.org/10.23919/ICS.2025.3553464

    As a potential solver for combinatorial optimization problems (COPs), the convergence speed and accuracy of Ising machines still have room to be improved at the level of algorithm and architecture design. In this paper, a novel parallel stochastic cellular automata tempering (PSCAT) algorithm is proposed, which combines the high parallel efficiency of stochastic cellular automata annealing (SCA) with the more balanced Monte Carlo sampling of parallel tempering (PT), to enhance the performance of fully-connected Ising machines. To achieve an area-efficient hardware design, a modified temperature exchange probability is applied to reduce the number of replicas and the utilization of the spin update module is improved by reducing the flip decision block. Additionally, the proposed coefficient access pattern effectively reduces memory overhead by sharing the weight matrix. The design prototype with 2,048 spins and 8 replicas is validated on FPGA. Using the K2000 max-cut problem as a benchmark, our design achieves a solution accuracy of 98.94% within 0.5 ms, which is higher than two state-of-the-art works.

  • Editorial
    ZHIYI YU, MINGYU WANG
    Integrated Circuits and Systems. 2025, 2(3): 100-101. https://doi.org/10.23919/ICS.2025.3586794
  • Special Issue Papers
    XI YANG, YAMIN MAO, LIANG CHANG, HAOJIE WEI, YUANBO WANG, JINGKE WANG, CHAO FAN, ZHONGMOU WU, SHOUZHONG PENG, JUN ZHOU
    Integrated Circuits and Systems. 2025, 2(1): 4-12. https://doi.org/10.23919/ICS.2025.3553460

    General-purpose edge neural networks need a lightweight architecture that effectively balances storage and computing resources. However, SRAM-based computing-in-memory (CIM) architectures face challenges in delivering adequate on-chip storage while fulfilling computing requirements. To overcome this, we introduce a new MRAM-based near-memory computing (NMC) architecture. It retains the costeffective data access benefits of CIM while separating storage and computing at the macro-level, improving deployment adaptability. We refine the NMC macro by incorporating small temporary storage and adopting a layer-fusion approach to enhance data-transfer efficiency. By integrating a high-capacity MRAM into the macro, we attain a storage density of 0.532 um2/bit. Moreover, we enhance the adder tree with a shift module, supporting multiply-and-accumulate (MAC) operations at five distinct depths (8, 9, 16, 32, and 64), which raises resource utilization efficiency to 88.3%. Our architecture achieves an on-chip storage density of 1.49 Mb/mm2 and an energy efficiency of 6.164 TOPS/W.

  • Regular Papers
    YUYANG LIU, RUNYE DING, YUJIE CHEN, PUJIN XIE, YAO LIU, ZHIYI YU
    Integrated Circuits and Systems. 2025, 2(3): 158-166. https://doi.org/10.23919/ICS.2025.3550116

    Since the discovery of speculative execution attacks based on side channels, there has been a long history of research on their attack mechanisms and defense principles. To explore TLB side channels, we constructed a System-on-Chip (SoC) centered around the XuanTie C910 processor on a Virtex UltraScale+ HBM VCU128 FPGA and ran the Linux operating system on this platform. We successfully implemented the Spectre-v1 attack targeting the multi-level TLB structure of the XuanTie C910 processor, identifying the second-level TLB as the primary target of the attack. In addition, we proposed a defense mechanism called TLBshield-v1, which employs a 50-percent block rate policy on the write-back channel from the Page Table Walker to the second-level TLB, thereby mitigating all attacks based on the second-level TLB. We tested a 50-percent block rate policy, which reduced the success rate of the Spectre-v1 attack from 100 percent to 55.7 percent, with a performance overhead of only 1.77 percent. Furthermore, we designed TLBshield-v2, with different block rates of second-level TLB, tested their corresponding performance overheads and security implications, and introduced a normalized evaluation metric, Security-Versus-Performance to determine the optimal design strategy that balances performance overhead and security under varying security requirements.

  • Special Issue Papers
    HANRU YANG, TIANRUI LYU, XIAO HUANG, JIANPING GUO
    Integrated Circuits and Systems. 2025, 2(1): 13-21. https://doi.org/10.23919/ICS.2025.3553458

    This work presents a bandgap voltage reference (BGR) with source-sink dual current compensation achieving a low temperature coefficient (TC) over the automotive temperature range from −40 to 125 °C. The two compensation currents are the inverted-V current (IinvV) and the high-low temperature linear current (IHLT), which appear in the form of sourcing and sinking currents, respectively. This design introduces an inverted-V current to mitigate the degradation of the compensation effect caused by temperature range drifts. By exploiting the characteristics of IinvV and IHLT exhibiting the same drift trend, the dual current compensation achieves the compensation performance over the entire automotive temperature range while mitigating the impact of temperature range drifts, thereby optimizing the overall compensation effect. The measured results show that it achieves the best TC of 2.0 ppm/°C and an average consumption current of 44 μA at room temperature. Moreover, the linear sensitivity (LS) is 0.04%/V and power supply rejection (PSR) is −60 dB at 1 Hz at room temperature.

  • Editorial
    LIN CHENG, BO ZHAO
    Integrated Circuits and Systems. 2025, 2(1): 2-3. https://doi.org/10.23919/ICS.2025.3555024