The future is frozen: cryogenic CMOS for high-performance computing

Saligram R.; Raychowdhury A.; Datta Suman

doi:10.1016/j.chip.2023.100082

Chip >

2024 , Vol. 3 >Issue 1: 100082 - 12

DOI: https://doi.org/10.1016/j.chip.2023.100082

Research Article

The future is frozen: cryogenic CMOS for high-performance computing

Saligram R. ^,^* ,
Raychowdhury A. ,
Datta Suman

Expand

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA

*E-mail: rakshith.saligram@gatech.edu (R. Saligram)

Received date: 2023-05-31

Accepted date: 2023-12-20

Online published: 2023-12-29

Fold

Abstract

Low temperature complementary metal oxide semiconductor (CMOS) or cryogenic CMOS is a promising avenue for the continuation of Moore's law while serving the needs of high performance computing. With temperature as a control “knob” to steepen the subthreshold slope behavior of CMOS devices, the supply voltage of operation can be reduced with no impact on operating speed. With the optimal threshold voltage engineering, the device ON current can be further enhanced, translating to higher performance. In this article, the experimentally calibrated data was adopted to tune the threshold voltage and investigated the power performance area of cryogenic CMOS at device, circuit and system level. We also presented results from measurement and analysis of functional memory chips fabricated in 28 nm bulk CMOS and 22 nm fully depleted silicon on insulator (FDSOI) operating at cryogenic temperature. Finally, the challenges and opportunities in the further development and deployment of such systems were discussed.

Key words： Cryogenic CMOS; Design technology co-optimization; High performance computing; Parameter variation; Threshold voltage engineering; Cryogenic Memories; Interconnects

Cite this article

Saligram R. , Raychowdhury A. , Datta Suman . The future is frozen: cryogenic CMOS for high-performance computing[J]. Chip, 2024 , 3(1) : 100082 -12 . DOI: 10.1016/j.chip.2023.100082

INTRODUCTION

The compute demand in the high-performance-compute (HPC) paradigm has grown over 100x in the last decade and has well surpassed the rate of growth in transistor density per classical Moore's law. To make matters worse, the dimensional scaling that has traditionally enabled Moore's law has itself plateaued. Reduction in the historical rate of performance gain across technology nodes, the inability to scale down the threshold voltage (and consequently the supply voltage) without increasing the leakage currents, increasing power density forcing throttling of processor clock speed have all contributed to the present situation. Data centers invest more than a third of their power budget on cooling costs, the aim of which is simply to prevent servers from shutting down induced by overheating. While the cooling systems, in particular immersion, have been employed to overclock the processors, the performance improvement is marginal (∼20%)¹. Low temperature CMOS or cryogenic CMOS operates at significantly lower temperatures than the conventional cooling systems, thereby providing significantly higher performance betterments.

Cryogenic CMOS has been prevalent since the early 1980's, there have been as many as 2000 packaged circuit chips with up to 20,000 gates, each of time operated at 90 K, and the performance has been doubled². In recent years, there has been a renewed interest on cryogenic CMOS for high performance computing^3,4. The steep subthreshold switching characteristics, accompanied by other performance boosters like improved mobility, improved reliability, lower wiring resistance and reduced self-heating make cryogenic CMOS a promising option. With steeper subthreshold slope (SS), the threshold voltage of the device can be reduced, which in turn leads to lower supply voltage and enhanced energy-delay product. In the current work, we presented the Berkeley short-channel IGFET model (BSIM) for cryogenic CMOS calibrated using measurement data from 14 nm node fin field effect transistor (FinFET). Furthermore, the threshold voltage engineering and its consequence in terms of device and circuit performance as well as effect of variation were investigated. A brief section on interconnect performance at low temperature was followed by discussion on a benchmarking study performed with the adoption of a 64-bit Arm Core designed through device-technology co-optimization. The paper then introduced three memories—6T SRAM (based on 14 nm FinFET—a simulation analysis), 2T gain cell embedded dynamic random-access memory (DRAM, based on 28 nm bulk CMOS—a fabricated chip level result) and 1T floating body random-access memory (RAM, based on 22 nm fully depleted silicon on insulator (FDSOI)—a device measurement result). In this article, much of the discussions were constrained to 70 K to 100 K range since it is believed to provide the best device performance at reasonable cooling cost for classical high-performance computing, but some data at 4 K was also presented for sake of completeness and applicability to quantum computing.

DEVICES

Measurement and modeling

The 14 nm node FinFET devices were measured with the adoption of Lakeshore CPX-VF cryogenic probe station from 300 K to 4 K. The DC characterization was performed adopting Keithley 4200 SCS parameter analyzer. The device transfer characteristics for N-type metal oxide semiconductor (NMOS) and P-type metal oxide semiconductor (PMOS) under linear and saturation conditions are shown in Fig. 1. The currents are shown in both linear and logarithmic scale to highlight the increase in the ON currents and the SS across temperatures. A key point of interest—the zero temperature coefficient (ZTC) point has been identified in all of the four plots. This gate voltage/drain current is critical in designing circuits which are temperature resilient, and the gate voltage must be higher in magnitude than the ZTC value for the devices to show improvement in the device ON currents. As established from the graphs, the device leakage current decreases exponentially while the ON current increases linearly with decreasing temperature. The saturation current is increased by ∼22% and ∼9% for NMOS and PMOS, respectively, which is primarily ascribed to the contribution from the increase in mobility of the charge carriers resulting from the reduced phonon scattering at low temperature. The extracted threshold voltage (Fig. 2) for linear and saturation regions for NMOS (PMOS) show a shift of 91 mV (110 mV) and 80 mV (107 mV) going from 300 K to 4 K. The increase in the threshold voltage is resulted from the increase in Si bandgap, decrease in intrinsic carrier density and increase in bulk Fermi potential^5,6. The extracted SS for NMOS (PMOS) decreases linearly from ∼65 mV/dec (62 mV/dec) at 300 K to ∼22 mV/dec (22 mV/dec) at 77 K, after which it tends to saturate. This is mainly attributed to the presence of band tail states⁷.

模态框（Modal）标题

Abstract

Cite this article

INTRODUCTION

DEVICES

Measurement and modeling

Fig. 1. Transistor transfer characteristics in linear and saturation regions for NMOS (a, b) and PMOS (c, d) showing linear increase in ON current, exponential decrease in subthreshold leakage current.

Fig. 2. Extracted subthreshold slope and threshold voltage in linear and saturation regimes for NMOS and PMOS across temperature.

Threshold voltage engineering

Fig. 3. a, 3D structure depicting dipole engineering in MOSFET. b, Work function difference created by addition of dipole layer.

Fig. 4. Normalized increase in device ON current at nominal supply voltage across key temperature points.

Fig. 5. RO Simulation.

Variation

Fig. 6. Variation of ION and IOFF under 3σ Vth variation with 2000 Monte Carlo samples of threshold voltage retargeted NMOS at 300 K and 77 K showing lower IOFF spread at 300 K compared to 77 K and lower ION spread at higher VDD.

Fig. 7. Variation in IOFF across temperature for a given 3σ Vth variation showing the spread gets worse as temperature decreases.

Fig. 8. Variation (percentage change) in IOFF from its nominal value across supply voltage for varying percentage changes in Vth at 300 K and 77 K showing stronger dependence on VDD at higher ΔVth at 77 K.

INTERCONNECTS

Fig. 9. Elmore delay distribution of top critical paths of a 64-bit Arm processor across technology nodes indicating the increasing contributions of interconnect creating performance bottlenecks.

Fig. 10. Normalized resistance of BEOL layers across temperature showing reduction at lower temperature.

Fig. 11. Extracted resistivities of BEOL metal layers and calibrated models which useFS-MS theory.

64 BIT ARM CORE AT CRYOGENIC TEMPERATURES

Standard cells characterization

Fig. 12. Normalized cell delay improvement at cryogenic temperature averaged across the standard cell library showing ∼3x improvement compared to subsequent technology node.

Fig. 13. Normalized input pin capacitance across the iso-IOFFtuned standard cell library for different drive strengths showing increased value at lower temperature due to higher charge accumulation for same gate overdrive voltage.

Fig. 14. Normalized input pin capacitance across the iso-IOFF tuned standard cell library and corresponding targeted Vth for different temperature points.

Microprocessor benchmarking

Fig. 15. Performance benchmarking of Cortex-A53 core at nominal VDD at different temperature points indicating percentage improvements. At 100 K, we can achieve the performance of a Class-A Core using the iso-IOFF SC library.

Fig. 16. Improvements in physical design metrics viz., combinational gates, inverter/buffer counts, wire length, via count, cell and gate count, and total cell area at 100 K due to improvement in standard cell performance and reduction in interconnect resistance at cryogenic temperature.

Fig. 17. Normalized performance of 64 bit Arm Cortex-A53 across multiple supply voltages at different temperatures.

Thermal benefits

Fig. 19. Improvement in bulk thermal conductivity of substrate silicon at cryogenic temperature showing more than 10x increase (recreated from ref.19).

Fig. 20. Flow diagram for thermal analysis starting from design database, switching activity files and material thermal property files to obtain heat map of the chip.

Fig. 21. Thermal heat map for ARES core implement in 7

MEMORIES

6T SRAM (base device—FinFETs)

Fig. 23. Supply voltage reduction for different scenarios - iso μ/VDD for hold, iso- μ/VDD for read and iso- μ/VDD for write. The devices are Vth tuned at 77 K.

2T gain cell EDRAM (base device—bulk CMOS)

Fig. 24. Retention probability of failure versus retention time for 28 nm 2T EDRAM shows 3σ mean time increases by > six orders of magnitude due to ultra-low leakage at cryogenic temperature.

Fig. 25. Array power vs bandwidth for different temperatures and refresh power/read write energies across temperature for 1 kb 2T gain cell EDRAM array in 28 nm bulk CMOS.

1T floating body RAM (base device—FDSOI)

Fig. 26. Operation principle of FBRAM—the presence or absence of charge carriers in the body modulates the Vth of the device and consequently the ON current.

Fig. 27. ΔIREAD at 77 K with Si FBRAM showing two retention states and SiGe FBRAM showing four retention states.

Fig. 28. Cache miss per 1000 instructions at 77 K at iso-silicon footprint showing 38%, 57%, and 66% reduction for 2T EDRAM, 1bit/cell FBRAM and 2 bit/cell FBRAM compared to 6T SRAM.

OPPORTUNITIES—OBSTACLES OUTLOOK

Fig. 29. Cooling cost for various non-ideality factors and normalized power reduction obtained by supply voltage scaling across temperature.

CONCLUSION

MISCELLANEA

References

Links

Fig. 6. Variation of I_ON and I_OFF under 3σ V_th variation with 2000 Monte Carlo samples of threshold voltage retargeted NMOS at 300 K and 77 K showing lower I_OFF spread at 300 K compared to 77 K and lower I_ON spread at higher V_DD.

Fig. 7. Variation in I_OFF across temperature for a given 3σ V_th variation showing the spread gets worse as temperature decreases.

Fig. 8. Variation (percentage change) in I_OFF from its nominal value across supply voltage for varying percentage changes in V_th at 300 K and 77 K showing stronger dependence on V_DD at higher ΔV_th at 77 K.

Fig. 13. Normalized input pin capacitance across the iso-I_OFFtuned standard cell library for different drive strengths showing increased value at lower temperature due to higher charge accumulation for same gate overdrive voltage.

Fig. 14. Normalized input pin capacitance across the iso-I_OFF tuned standard cell library and corresponding targeted V_th for different temperature points.

Fig. 15. Performance benchmarking of Cortex-A53 core at nominal V_DD at different temperature points indicating percentage improvements. At 100 K, we can achieve the performance of a Class-A Core using the iso-I_OFF SC library.

Fig. 19. Improvement in bulk thermal conductivity of substrate silicon at cryogenic temperature showing more than 10x increase (recreated from ref.¹⁹).

Fig. 23. Supply voltage reduction for different scenarios - iso μ/V_DD for hold, iso- μ/V_DD for read and iso- μ/V_DD for write. The devices are V_th tuned at 77 K.

Fig. 26. Operation principle of FBRAM—the presence or absence of charge carriers in the body modulates the V_th of the device and consequently the ON current.

Fig. 27. ΔI_READ at 77 K with Si FBRAM showing two retention states and SiGe FBRAM showing four retention states.