Most Viewed

  • Published in last 1 year
  • In last 2 years
  • In last 3 years
  • All

Please wait a minute...
  • Select all
    |
  • Special Issue on Selected Papers from ICTA2023
    WEIYI ZHANG, CHAOYANG DING, XIAORUI MO, FEI SHAO, YIYANG WANG, YUSHI GUO, LITING NIU, CHENG NIAN, FASIH UD DIN FARRUKH, CHUN ZHANG
    Integrated Circuits and Systems. 2024, 1(2): 66-79. https://doi.org/10.23919/ICS.2024.3449791

    Simultaneous Localization and Mapping (SLAM) is the process by which a mobile robot can build a map of the surrounding environment and compute its own location. Feature point extraction is one of the key components of a SLAM system. The extraction accuracy and efficiency of corner detection directly affect the overall accuracy and throughput of the system. However, the complexity of corner detection algorithms makes it challenging to achieve real-time implementation and efficient, low-cost hardware design, especially for mobile robots. Harris corner detection class algorithms including Harris and GFTT (Good Feature to Track) have improved accuracy. However, those algorithms require high resource consumption and latency when implemented on hardware platforms. The GFTT achieves higher accuracy than Harris while requiring higher computational complexity. To address the throughput problem, SFTT (Simple Feature to Track), a new Harris class detection algorithm is proposed, and the corresponding hardware accelerator is designed. The proposed SFTT significantly reduced the computational complexity compared with the Harris algorithm and GFTT. Experiments have shown SFTT also achieved slightly higher accuracy compared with the two algorithms. Furthermore, the GFTT accelerator is designed which reaches up to 325 fps at the frequency of 100 MHz. The proposed design has achieved an improvement in throughput by 1.3× times and power efficiency by 1.7× times as compared to state-of-the-art design.

  • Special Issue on Selected Papers from ICTA2023
    SHIJIE LI, RUICHANG MA, MINGXING DENG, JIAMIN XUE, WEI DENG, BAOYONG CHI, HAIKUN JIA
    Integrated Circuits and Systems. 2024, 1(2): 109-118. https://doi.org/10.23919/ICS.2024.3423852

    This paper presents a 32 Gbps wireline transceiver that not only supports the JESD204 C standard but also maintains back-compatibility with JESD204B with minimal additional circuitry. Additionally, a pattern-filtered phase detector (PFPD) is proposed to circumvent the side effect of ambiguous sampling clock phase caused by loop-unrolled 1st post-cursor tap equalization scheme in the decision-feedback equalization (DFE). A 16 GHz external half-rate clock is injected into an on-chip injection-locked ring oscillator to distribute the 16 GHz clock for both the receiver and the transmitter. Multiple on-chip adaption engines and calibration loops are also added to assist the whole system work properly, such as tap weight and desired level adaption engine integrated into the decision-feedback equalizer, duty cycle distortion correction and IQ-mismatch correction. Fabricated in 28 nm CMOS process, the proposed transceiver demonstrates its ability to operate within a signaling range from 312.5 Mbps to 32 Gbps, achieving a BER below 10−12 over a 14.9 dB channel loss at Nyquist frequency. It occupies an aggregated area of 1.4 mm2 and consumes 203 mW at 32 Gbps, in which 50 mW for the transmitter (TX) and 153 mW for the receiver (RX), therefore end up achieving 6.34pJ/bit power efficiency at 32 Gbps.

  • SHUNQIN CAI, LIUKAI XU, DENGFENG WANG, ZHI LI, WEIKANG QIAN, LIANG CHANG, YANAN SUN
    Integrated Circuits and Systems. 2024, 1(2): 80-91. https://doi.org/10.23919/ICS.2024.3419630

    SRAM-based computing-in-memory (SRAM-CIM) is expected to solve the “Memory Wall” problem. For the digital domain SRAM-CIM, full-precision digital logic has been utilized to achieve high computational accuracy. However, the energy and area efficiency advantages of CIM cannot be fully utilized under error-resilient neural networks (NNs) with given quantization bit-width. Therefore, an all-digital Bit-wise Approximate compressor configurable In-SRAM-computing macro for Energy-efficient NN acceleration, with a data-aware weight Remapping method (BASER), is proposed in this paper. Leveraging the NN error resilience property, six energy-efficient bit-wise compressor configurations are presented under 4b/4b and 3b/3b NN quantization, respectively. Concurrently, a data-aware weight remapping approach is proposed to enhance the NN accuracy without supplementary retraining further. Evaluations of VGG-9 and ResNet-18 on CIFAR-10 and CIFAR-100 datasets show that the proposed BASER achieves 1.35x and 1.29x improvement in energy efficiency, as well as limited accuracy loss and improved NN accuracy, as compared to the previous full-precision and approximate SRAM-CIM design, respectively.

  • Regular Papers
    FUPING LI, YING WANG, MEIXUAN LU, YUTONG ZHU, HAORAN WANG, ZHUN ZHAO, JUNPEI HUANG, XIAOTONG WEI, XIHAO LIANG, YUJIE WANG, HAOBO XU, HUAWEI LI, XIAOWEI LI, QI LIU, MING LIU, NINGHUI SUN, YINHE HAN
    Integrated Circuits and Systems. 2024, 1(1): 18-30. https://doi.org/10.23919/ICS.2024.3451428

    Due to the waning of Moore’s Law, the conventional monolithic chip architectural design is confronting hurdles such as increasing die size and skyrocketing cost. In this post-Moore era, the integrated chip has emerged as a pivotal technology, gaining substantial interest from both the academia and industry. Compared with monolithic chips, the chiplet-based integrated chips can significantly enhance system scalability, curtail costs, and accelerate design cycles. However, integrated chips introduce vast design spaces encompassing chiplets, inter-chiplet connections, and packaging parameters, thereby amplifying the complexity of the design process. This paper introduces the Optimal Decomposition-Combination Theory, a novel methodology to guide the decomposition and combination processes in integrated chip design. Furthermore, it offers a thorough examination of existing integrated chip design methodologies to showcase the application of this theory.

  • Special Issue on Selected Papers from ICTA2023
    GUOQING WANG, ZHAO ZHANG
    Integrated Circuits and Systems. 2024, 1(2): 103-108. https://doi.org/10.23919/ICS.2024.3456043

    This work presents a PAM4 receiver analog frontend (AFE) operating up to 64 Gb/s. The electronic integrated circuit (EIC) is fabricated in 40-nm CMOS technology. This AFE is composed of a single-stage Continuous-Time Linear Equalizer (CTLE), a Variable Gain Amplifier (VGA), an input impedance matching network, a buffer stage, and an output buffer. The single-stage triple-peaking CTLE proposed employs current reuse technique and a multi-feedback structure, enabling the adjustment of peaking in the low, mid, and high-frequency bands. Thus, only one-stage CTLE is sufficient to achieve an over-20-dB boost at Nyquist frequency to save power. The VGA adopts an enhanced structure based on the Gilbert cell, where the gain is manipulated by controlling the gate voltage of MOS transistors. The CTLE undergoes variations in its DC gain during the adjustment process to equalize channel losses. The role of the VGA is to stable the DC gain changes induced by the adjustment of the CTLE. The output buffer adopts two stages, aiming to ensure that the gain does not attenuate excessively while maintaining output impedance matching. The AFE consumes 21.1 mW with a supply voltage of 1.5/1 V. It can provide a maximum boost of 22.5 dB, and the data rate reaches up to 64 Gb/s. Additionally, it features peaking adjustment capabilities in the low, mid, and high-frequency bands. Finally, the measurement demonstrates its ability to effectively equalize a channel with a 12-dB loss at the Nyquist frequency of 16 GHz.

  • Special Section on Selected Papers from ASICON2023
    XIANGCHEN WAN, SIQING WU, XINWEI YU, XINGTAO ZHU, AND FAN YE
    Integrated Circuits and Systems. 2024, 1(1): 33-42. https://doi.org/10.23919/ICS.2024.3422708

    This paper presents an AC coupling ultrasound analog front-end (AFE) architecture with a three-stage DC offset correction (DCOC) circuit. In ultrasound systems, the low noise amplifier (LNA), time gain control (TGC), and low pass filter (LPF) constitute the AFE, which achieves low noise, time-varying gain compensation, and filtering for the received ultrasound signal. The inherent asymmetry in LNA, layout asymmetry and the process variation introduce DC offset and the TGC changes it into low-frequency offset drift. The proposed DCOC circuit for LNA is composed of a transconductance amplifier and an off-chip capacitor, while a fully differential operational amplifier and a pseudo resistor are used for other amplification stages. The AC coupling scheme is also used to reduce the offset and drift. The simulation result shows when the DCOC and the AC coupling are adopted, the offset and drift are almost perfectly suppressed. The proposed AFE has been fabricated by a 28-nm CMOS process, and it achieves an 85 dB gain range with low input-referred noise of 2.43 nV/ $\sqrt{Hz}$ at 5 MHz, and it also has a tunable bandwidth of 15/30 MHz and switchable input impedance of 50/100 ohms.

  • Regular Papers
    SHAO-CHUN HUNG, PARTHO BHOUMIK, ARJUN CHAUDHURI, SANMITRA BANERJEE, KRISHNENDU CHAKRABARTY
    Integrated Circuits and Systems. 2024, 1(1): 3-17. https://doi.org/10.23919/ICS.2024.3419629

    As Moore’s Law approaches its limits, 3-D integrated circuits (ICs) have emerged as promising alternatives to conventional scaling methodologies. However, the benefits of 3-D integration in terms of lower power consumption, higher performance, and reduced area are accompanied by testing challenges. The unique vertical stacking of components in 3-D ICs introduces concerns related to the robustness of bonding surfaces. Moreover, immature manufacturing processes during 3-D fabrication can lead to high defect rates in different tiers. Therefore, there is a need for design-for-test solutions to ensure the reliability and performance of 3-D-integrated architectures. In this paper, we provide a comprehensive survey of existing testing strategies for 3-D ICs. We describe recent advances, including research efforts and industry practice, that address concerns related to bonding defects, elevated power supply noise, fault diagnosis, and fault localization specific to the unique characteristics of 3-D ICs.

  • Special Issue on Selected Papers from ICTA2023
    JUNYAN SUN, XUEFEI BAI
    Integrated Circuits and Systems. 2024, 1(2): 92-102. https://doi.org/10.23919/ICS.2024.3419562

    CRYSTALS-Kyber has emerged as a notable lattice-based post-quantum cryptography (PQC) scheme. As one of the four finalists in NIST’s PQC standardization round three, CRYSTALS-Kyber is the only encryption algorithm demonstrating superior performance compared to other algorithms. The number theoretic transform (NTT) is employed to optimize polynomial multiplication, which constitutes the most complex operation within CRYSTALS-Kyber. This study introduces a high-speed NTT accelerator architecture, featuring a novel butterfly unit and an efficient modular polynomial multiplier. The proposed accelerator utilizes a radix-4-based configurable NTT design, which is capable of executing both forward and inverse NTT operations on a unified architecture. When implemented on the Xilinx Virtex-7 FPGA platform, the proposed architecture achieves an acceleration of 1.02-2.30 times in terms of latency, a throughput improvement of 1.02-2.30 times, and an area throughput improvement of up to 3.30 times, relative to the prior works.

  • Special Issue on Selected Papers from ICTA2023
    XIAOYAN GUI, LIN CHENG
    Integrated Circuits and Systems. 2024, 1(2): 64-65. https://doi.org/10.23919/ICS.2024.3483732
  • Special Section on Selected Papers from ASICON2023
    JIAXIANG LI, MASAO YANAGISAWA, YoUHUA SH
    Integrated Circuits and Systems. 2024, 1(1): 53-62. https://doi.org/10.23919/ICS.2024.3423850

    The large-scale neural networks have brought incredible shocks to the world, changing people's lives and offering vast prospects. However, they also come with enormous demands for computational power and storage pressure, the core of its computational requirements lies in the matrix multiplication units dominated by multiplication operations. To address this issue, we propose an area-power-efficient multiplier-less processing element (PE) design. Prior to implementing the proposed PE, we apply a power-of-2 dictionary-based quantization to the model and effectiveness of this quantization method in preserving the accuracy of the original model is confirmed. In hardware design, we present a standard and one variant ‘bi-sign’ architecture of the PE. Our evaluation results demonstrate that the systolic array that implement our standard multiplier-less PE achieves approximately 38% lower power-delay-product and 13% smaller core area compared to a conventional multiplication-and-accumulation PE and the bi-sign PE design can even save 37% core area and 38% computation energy. Furthermore, the applied quantization reduces the model size and operand bit-width, leading to decreased on-chip memory usage and energy consumption for memory accesses. Additionally, the hardware schematic facilitates expansion to support other sparsity-aware, energy-efficient techniques.

  • Special Section on Selected Papers from ASICON2023
    HENG ZHANG, YICHUAN BAI, JUNJIE SHEN, YUAN DU, LI DU
    Integrated Circuits and Systems. 2024, 1(1): 43-53. https://doi.org/10.23919/ICS.2024.3422968

    Deep learning has recently gained significant prominence in various real-world applications such as image recognition, natural language processing, and autonomous vehicles. While deep neural networks appear to have different architectures, the main operations within these models are matrix-vector multiplications (MVM). Compute-in-memory (CIM) architectures are promising solutions for accelerating the massive MVM operations by alleviating the frequent data movement issue in traditional processors. Ana log CIM macros leverage current-accumulating or charge-sharing mechanisms to perform multiply-and-add (MAC) computations. Even though they can achieve high throughput and efficiency, the computing accuracy is sacrificed due to the analog nonidealities. To ensure precise MAC calculations, it is crucial to analyze the sources of nonidealities and identify their impacts, along with corresponding solutions. In this paper, comprehensive linearity analysis and dedicated calibration methods for charge domain static-random access memory (SRAM) based in-memory computing circuits are proposed. We analyze nonidealities from three areas based on the mechanism of charge domain computing: charge injection effect, temperature variations, and ADC reference voltage mismatch. By designing a 256 × 256 CIM macro and conducting investigations via post-layout simulation, we conclude that these nonidealities don’t deteriorate the computing linearity, but only cause the scaling and bias drift. To mitigate the scaling and bias drift identified, we propose three calibration methods ranging from the circuit level to the algorithm level, all of which exhibit promising results. The comprehensive analysis and calibration methods can assist in designing CIM macros with more accurate MAC computations, thereby supporting more robust deep learning inference.

  • Special Section on Selected Papers from ASICON2023
    FRANCOIS RIVET,, LIANG QI
    Integrated Circuits and Systems. 2024, 1(1): 31-32. https://doi.org/10.23919/ICS.2024.3484397
  • Editorial
    Ming Liu, PETER LIAN
    Integrated Circuits and Systems. 2024, 1(1): 2-2. https://doi.org/10.23919/ICS.2024.3483733