Research Article

The future is frozen: cryogenic CMOS for high-performance computing

  • Saligram R. , * ,
  • Raychowdhury A. ,
  • Datta Suman
Expand
  • School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA
*E-mail: (R. Saligram)

Received date: 2023-05-31

  Accepted date: 2023-12-20

  Online published: 2023-12-29

Abstract

Low temperature complementary metal oxide semiconductor (CMOS) or cryogenic CMOS is a promising avenue for the continuation of Moore's law while serving the needs of high performance computing. With temperature as a control “knob” to steepen the subthreshold slope behavior of CMOS devices, the supply voltage of operation can be reduced with no impact on operating speed. With the optimal threshold voltage engineering, the device ON current can be further enhanced, translating to higher performance. In this article, the experimentally calibrated data was adopted to tune the threshold voltage and investigated the power performance area of cryogenic CMOS at device, circuit and system level. We also presented results from measurement and analysis of functional memory chips fabricated in 28 nm bulk CMOS and 22 nm fully depleted silicon on insulator (FDSOI) operating at cryogenic temperature. Finally, the challenges and opportunities in the further development and deployment of such systems were discussed.

Cite this article

Saligram R. , Raychowdhury A. , Datta Suman . The future is frozen: cryogenic CMOS for high-performance computing[J]. Chip, 2024 , 3(1) : 100082 -12 . DOI: 10.1016/j.chip.2023.100082

INTRODUCTION

The compute demand in the high-performance-compute (HPC) paradigm has grown over 100x in the last decade and has well surpassed the rate of growth in transistor density per classical Moore's law. To make matters worse, the dimensional scaling that has traditionally enabled Moore's law has itself plateaued. Reduction in the historical rate of performance gain across technology nodes, the inability to scale down the threshold voltage (and consequently the supply voltage) without increasing the leakage currents, increasing power density forcing throttling of processor clock speed have all contributed to the present situation. Data centers invest more than a third of their power budget on cooling costs, the aim of which is simply to prevent servers from shutting down induced by overheating. While the cooling systems, in particular immersion, have been employed to overclock the processors, the performance improvement is marginal (∼20%)1. Low temperature CMOS or cryogenic CMOS operates at significantly lower temperatures than the conventional cooling systems, thereby providing significantly higher performance betterments.
Cryogenic CMOS has been prevalent since the early 1980's, there have been as many as 2000 packaged circuit chips with up to 20,000 gates, each of time operated at 90 K, and the performance has been doubled2. In recent years, there has been a renewed interest on cryogenic CMOS for high performance computing3,4. The steep subthreshold switching characteristics, accompanied by other performance boosters like improved mobility, improved reliability, lower wiring resistance and reduced self-heating make cryogenic CMOS a promising option. With steeper subthreshold slope (SS), the threshold voltage of the device can be reduced, which in turn leads to lower supply voltage and enhanced energy-delay product. In the current work, we presented the Berkeley short-channel IGFET model (BSIM) for cryogenic CMOS calibrated using measurement data from 14 nm node fin field effect transistor (FinFET). Furthermore, the threshold voltage engineering and its consequence in terms of device and circuit performance as well as effect of variation were investigated. A brief section on interconnect performance at low temperature was followed by discussion on a benchmarking study performed with the adoption of a 64-bit Arm Core designed through device-technology co-optimization. The paper then introduced three memories—6T SRAM (based on 14 nm FinFET—a simulation analysis), 2T gain cell embedded dynamic random-access memory (DRAM, based on 28 nm bulk CMOS—a fabricated chip level result) and 1T floating body random-access memory (RAM, based on 22 nm fully depleted silicon on insulator (FDSOI)—a device measurement result). In this article, much of the discussions were constrained to 70 K to 100 K range since it is believed to provide the best device performance at reasonable cooling cost for classical high-performance computing, but some data at 4 K was also presented for sake of completeness and applicability to quantum computing.

DEVICES

Measurement and modeling

The 14 nm node FinFET devices were measured with the adoption of Lakeshore CPX-VF cryogenic probe station from 300 K to 4 K. The DC characterization was performed adopting Keithley 4200 SCS parameter analyzer. The device transfer characteristics for N-type metal oxide semiconductor (NMOS) and P-type metal oxide semiconductor (PMOS) under linear and saturation conditions are shown in Fig. 1. The currents are shown in both linear and logarithmic scale to highlight the increase in the ON currents and the SS across temperatures. A key point of interest—the zero temperature coefficient (ZTC) point has been identified in all of the four plots. This gate voltage/drain current is critical in designing circuits which are temperature resilient, and the gate voltage must be higher in magnitude than the ZTC value for the devices to show improvement in the device ON currents. As established from the graphs, the device leakage current decreases exponentially while the ON current increases linearly with decreasing temperature. The saturation current is increased by ∼22% and ∼9% for NMOS and PMOS, respectively, which is primarily ascribed to the contribution from the increase in mobility of the charge carriers resulting from the reduced phonon scattering at low temperature. The extracted threshold voltage (Fig. 2) for linear and saturation regions for NMOS (PMOS) show a shift of 91 mV (110 mV) and 80 mV (107 mV) going from 300 K to 4 K. The increase in the threshold voltage is resulted from the increase in Si bandgap, decrease in intrinsic carrier density and increase in bulk Fermi potential5,6. The extracted SS for NMOS (PMOS) decreases linearly from ∼65 mV/dec (62 mV/dec) at 300 K to ∼22 mV/dec (22 mV/dec) at 77 K, after which it tends to saturate. This is mainly attributed to the presence of band tail states7.
Fig. 1. Transistor transfer characteristics in linear and saturation regions for NMOS (a, b) and PMOS (c, d) showing linear increase in ON current, exponential decrease in subthreshold leakage current.
Fig. 2. Extracted subthreshold slope and threshold voltage in linear and saturation regimes for NMOS and PMOS across temperature.

Threshold voltage engineering

Threshold voltage engineering could be employed to tune the Vth of the devices so as to increase current gain across the temperature, thereby obtaining performance gain. This will allow us to annul the effect of Vth increase and thereby reduce the supply voltage and take advantage of the improved cryogenic SS. It will also yield higher device current for a given gate overdrive voltage. Vth tuning can be accomplished with dipole engineering (Fig. 3) and can be furthered by boosting intrinsic gate capacitance with the adoption of mixed ferroic HfO2-ZrO2 superlattice (HZH) stack which effectively reduces the equivalent oxide thickness from negative capacitance effect without degradation in carrier transport8,9. With Vth as a tunable switch, it could be reduced at low temperature to match the leakage current of the room temperature devices—scenario henceforth described as iso-IOFF. A more moderate approach could also be taken, wherein the Vth is tuned to match the Vth of the room temperature devices—a scenario referred to as iso-Vth. The threshold voltage reduction required to achieve iso-IOFF is higher than that required for iso-Vth, and the final engineered Vth value at a given temperature will be lower for iso-IOFF condition compared to iso-Vth. Since the rationale is to extract the maximum performance at a given temperature, the processor design will be most benefited by the iso-IOFF scenario. However, embedded memory designs may be better off with iso-Vth as will be elucidated in the later section. BSIM-CMG models were calibrated for the measured data (the details of the model calibration have been presented in ref.10 and not discussed as it does not provide additional insights. Interested readers can refer to refs.10-13 for details of the cryogenic phenomena and the device physics that govern them). We further empirically tuned the PHIG variable in the model mimicking the threshold voltage tuning for obtaining iso-IOFF or iso-Vth scenario to analyze the circuits and systems. The normalized currents for the three key low temperatures under iso-IOFF condition are displayed in Fig. 4.
Fig. 3. a, 3D structure depicting dipole engineering in MOSFET. b, Work function difference created by addition of dipole layer.
Fig. 4. Normalized increase in device ON current at nominal supply voltage across key temperature points.
With the iso-IOFF models, we simulated an 11 stage NAND2 gate-based ring oscillator across temperature. The frequency of oscillation was determined at multiple supply voltages and the corresponding energy was recorded. From the frequency of oscillation, the stage delay was calculated and the resultant delay versus energy plot is as shown in Fig. 5. Three axes—α, β, and γ corresponding to iso-VDD, iso-energy, and iso-performance were identified, respectively; iso-VDD is the line joining the topmost points of each curve, iso-energy is line parallel to X-axis and iso-perf is line parallel to Y-axis. We observe (i) up to 8.3x performance improvement at iso-VDD albeit energy increase, (ii) up to 5.3x energy improvement at iso-performance mainly due to the ability to reduce the supply voltage and (iii) up to 6x performance improvement at iso-energy as we go down the temperature.

Variation

By analyzing the process-induced variations, the aim is to understand how they influence the operational characteristics of transistors. The threshold voltage variations can result in deviations in the ON current of the device, affecting its ability to conduct and switch effectively. Additionally, these variations can also lead to changes in the subthreshold leakage current, which can be detrimental to the overall power efficiency of digital circuits. The device behavior under the effect of Vth variation and later the effects on memory cells, particularly SRAM cells would be understood.
In this simulation, we assumed a 3σ Vth variation of 30 mV consistent with the study presented in ref.14 and performed Monte Carlo simulation for 2000 devices to obtain the device ON current and OFF current for varying values of supply voltages. The amount of Vth variation is considered to be constant across temperatures. For a given amount of doping concentration (with RDF considered), the number of ionized dopants does not change with temperature (70 K < T < 300 K). Dopant freezeout occurs when the ionization energy of dopant atoms exceeds the thermal energy ( $k T$, where $k$ is the Boltzmann's constant and T denotes the absolute temperature) available at a specific temperature. This is not particularly true for the considered temperature range of 70 K < T < 300 K, and it has been shown that dopant freezeout happens only when T is below 50 K13. It has also been experimentally verified that Vth variations are temperature independent15. The study presented in ref.16 performs similar analysis but with matched IOFF as well as matched ION, where ION matching is through supply voltage reduction at cryogenic temperature. Since the effect of supply voltage on the current cannot be neglected especially when SS are steeper at cryogenic temperatures, and the performance and energy improvement at cryogenic temperature in Vth retargeted devices greatly depends on supply voltage scaling, the device ON current was not fixed.
Fig. 6 shows the variation of ION and IOFF under 3σ Vth variation for 2000 NMOS under Monte Carlo analysis at 300 K and 77 K. All the devices are the iso-IOFF devices obtained after Vth retargeting. This implies the nominal Vth of 77 K device is lower than that of 300 K device, thus enabling its operation at VDD of 0.2 V (yellow). At a VDD of 0.3 V, the variation in ION is ∼10x at 300 K while it is less than 0.3x at 77 K. However, the IOFF variation is increased from ∼10x at 300 K to ∼104x at 77 K for the VDD of 0.3 V. This higher variation in the IOFF is mainly ascribed to the steeper SS at cryogenic temperatures. As the VDD is increased to 0.7 V, variation in ION tends to become lesser for both 300 K and 77 K compared to their 0.3 V VDD counterpart. To understand a bit more, emphasis was laid upon investigating the effect of Vth shift on the IOFF across temperature. The simulation involves Monte Carlo analysis with 2000 samples for iso-IOFF devices (iso-IOFF at nominal Vth) at multiple temperatures. As evident from the plot in Fig. 7, the lower the temperature is, the higher the slope and the wider the spread of IOFF again will be, which is mainly ascribed to steeper SS at low temperatures. This can be further evidenced from Fig. 8 showing ΔIOFF% ((IOFFIOFF,nom)/IOFF,nom·100) as a function of ΔVth% ((VthVth,nom)/Vth,nom·100) and VDD, where, IOFF,nom and Vth,nom denote nominal values of subthreshold leakage and threshold voltage at a given VDD (which are different from 3σ Vth variation).
Fig. 6. Variation of ION and IOFF under 3σ Vth variation with 2000 Monte Carlo samples of threshold voltage retargeted NMOS at 300 K and 77 K showing lower IOFF spread at 300 K compared to 77 K and lower ION spread at higher VDD.
Fig. 7. Variation in IOFF across temperature for a given 3σ Vth variation showing the spread gets worse as temperature decreases.
Fig. 8. Variation (percentage change) in IOFF from its nominal value across supply voltage for varying percentage changes in Vth at 300 K and 77 K showing stronger dependence on VDD at higher ΔVth at 77 K.

INTERCONNECTS

Back end of line (BEOL) interconnects are as important as devices at scaled nodes determining performance bottlenecks. The Elmore delay distribution of critical paths of a 64-bit Arm processor core shown in Fig. 9 highlights the increasing contributions from interconnect delay across technology nodes. Fortunately, the interconnect resistance decreases with the decrease in temperature due to the reduction in the bulk copper resistivity. The interconnect resistance on a commercial foundry 22 nm node has been characterized and modelled with the adoption of Fuchs-Sondheimer-Mayadas-Shatzkes models17, and the resultant normalized resistance and effective resistivity across the BEOL stack is presented in Fig. 10 and Fig. 11. The reduction in the resistance for global metal layers is higher than that for local and intermediate metal layers owing to the higher percentage of bulk copper per volume. Additionally, due to the lower resistance (= lower RC delay) and lower signal attenuation, the number of repeaters needed to faithfully transmit the signals across the chip and their corresponding energies is reduced at low temperature. However, at cryogenic temperature, copper might not be the best interconnect material and other metals like ruthenium and aluminum are endowed with superior characteristics in terms of resistance and in turn reduced IR drop as well as lower Joule-heating18. However, since calibrated data on alternate metal interconnects are not available, the remaining analysis in the current work still assumes copper interconnects.
Fig. 9. Elmore delay distribution of top critical paths of a 64-bit Arm processor across technology nodes indicating the increasing contributions of interconnect creating performance bottlenecks.
Fig. 10. Normalized resistance of BEOL layers across temperature showing reduction at lower temperature.
Fig. 11. Extracted resistivities of BEOL metal layers and calibrated models which useFS-MS theory.

64 BIT ARM CORE AT CRYOGENIC TEMPERATURES

Standard cells characterization

To ensure high-quality performance, standard cell libraries designed for production use are subjected to re-characterization at low temperatures. This process involves the utilization of industry-standard tools, which begin with device models and RC models of the BEOL stack. Using PDK- based schematics and layouts, parasitic extraction was carried out to obtain timing arc information and energy/power values, which are subsequently adopted to construct the standard cell libraries. This entire procedure was repeated for multiple supply voltages of 0.4, 0.6 and 0.8 V (nominal) and across four temperature points of 300, 200, 150, and 100 K. At the temperature of T = 100 K, the logic gates in the library exhibit a significant improvement in delay of up to 38% compared to RT. Delay is calculated as an average of rise and fall delays across all the cells in the library. This improvement is considerably higher than what can be achieved through technology node scaling, even with advancements in FinFET structures and tighter metal and contact poly pitches. Fig. 12 compares two FinFET libraries across different technology nodes (14 and 7 nm) with the same fin count and shows that cooling to 100 K results in approximately 3x better delay performance compared to technology scaling.
Fig. 12. Normalized cell delay improvement at cryogenic temperature averaged across the standard cell library showing ∼3x improvement compared to subsequent technology node.
Since the devices are tuned to increase OFF current at cryogenic temperature (low-Vth at lower temperature) for maximizing the ON current, an increase in the inversion charge in the channel for a given supply voltage was observed at cryogenic temperature compared to RT. As a result, the input pin capacitance increases at lower temperatures. The average input pin capacitance across different standard cells categorized by their drive strengths at different temperatures is shown in Fig. 13. It can be seen that the input capacitance has been increased by 12% to 15% at 100 K compared to RT, and the average input pin capacitance of the standard battery library as well as the temperature-adjusted threshold voltage are shown in Fig. 14.
Fig. 13. Normalized input pin capacitance across the iso-IOFFtuned standard cell library for different drive strengths showing increased value at lower temperature due to higher charge accumulation for same gate overdrive voltage.
Fig. 14. Normalized input pin capacitance across the iso-IOFF tuned standard cell library and corresponding targeted Vth for different temperature points.
Similar effects can also be observed with multi-Vth standard cell libraries at RT, wherein the ULVT (ultra-low-Vth) standard cell library tends to have higher average input pin capacitance compared to UHVT (ultra-high-Vth) standard cell library. Additionally, short circuit energy increases at low temperature due to increase in the transistor peak current and becomes a strong function of the input pin slew, i.e. at a given temperature, higher the signal slew and higher the short circuit energy.

Microprocessor benchmarking

An Arm Cortex-A53 CPU was implemented across temperatures and supply voltages with the adoption of the aforementioned standard cell libraries4. An eleven-metal layer BEOL stack was selected for the design implementation with the top two layers dedicated to performing power and ground routing while the rest were used for both signal and power routing. A standard VLSI design flow starting with synthesis and floorplan, power delivery network design, standard cell placement, clock tree synthesis followed by signal routing and optimization is utilized. The design is said to meet an operation frequency when the timing violations are less than 5% of the clock period and all the routing violations are less than 30 in count (so that the design is fixable with Engineering Change Order). The signal slew is set at a fixed percentage of the clock period for each run.
At a nominal supply voltage of 0.8 V, the performance of the Cortex-A53 core is increased by 56% going from 300 K to 100 K (Fig. 15); however, the power dissipation increases. The switching power increase stems from (i) increase in clock frequency and (ii) increase in the total switched capacitance (Csw). The Csw increase is in turn from increase in input gate capacitance with lower Vth devices, up-sizing of gates to achieve higher targeted frequency and more buffering for hold fixing. As explained earlier, the internal (short circuit) power tends to show an increasing trend owing to higher short circuit current. The improvements showcased at the processor level surpass that of the logic-gate level (38% in Fig. 12), with the improvement in the BEOL RC and aided by electronic design-tool optimizations. Thus, at 100 K, it is technically possible to achieve the performance of an Arm Class-A HP core. There are a number of other advantages that show up at the processor level implementation of 100 K compared to RT (Fig. 16) viz., the number of combinational gates is reduced by ∼5%, while the counts of inverter and buffer are reduced by ∼20% and 22%, respectively. The buffer inverter count reduction can be attributed to the 6% reduction in wire length and lower interconnect RC delay at cryogenic temperatures, accounting for the reduction in the via count. Similarly, the cell count, gate count, and the total cell area are reduced at cryogenic temperature.
Fig. 15. Performance benchmarking of Cortex-A53 core at nominal VDD at different temperature points indicating percentage improvements. At 100 K, we can achieve the performance of a Class-A Core using the iso-IOFF SC library.
Fig. 16. Improvements in physical design metrics viz., combinational gates, inverter/buffer counts, wire length, via count, cell and gate count, and total cell area at 100 K due to improvement in standard cell performance and reduction in interconnect resistance at cryogenic temperature.
One of the advantages of cryogenic operation and the low Vth devices is that the supply voltage can be lowered to achieve power reduction while maintaining the same performance. Thus, as a pragmatic next step, we implemented the core at reduced supply voltages of 0.6 V and 0.4 V across all the temperatures. However, at 0.4 V, the RT devices cannot reliably switch due to the low gate overdrive and hence the implementation fails. At 0.6 V, up to 87% performance boost compared to 0.6 V-RT frequency is observed (which is evidently less than 0.8 V-RT-frequency); and at 0.4 V supply voltage, the 100 K design can meet the target performance of RT design operating at nominal VDD. The normalized performance across temperature for different supply voltages and the corresponding improvement w.r.t RT design is shown in Fig. 17. The plot of performance/watt vs performance across temperatures and supply voltages is depicted in Fig. 18, which shows iso-power performance improvement and more than 4x power improvement at iso-performance by taking advantage of the low-VDD.
Fig. 17. Normalized performance of 64 bit Arm Cortex-A53 across multiple supply voltages at different temperatures.
Fig. 18. Performance per watt versus performance of 64 bit Arm Cortex-A53 indicating up to 4x improvement at iso-frequency by scaling down the temperature from 300 K to 150 K and corresponding supply voltages from 0.6 V to 0.4 V and up to 3.7x by going from 300 K to 100 K and reducing supply voltage from 0.8 V to 0.4 V.

Thermal benefits

Cryogenic computing is also endowed with the advantages of self-heating effects due to increased thermal conductivity of bulk silicon by almost 10x16 (Fig. 19). The thick silicon substrate plays a major role in heat dissipation and controlling the junction temperatures. Low-temperature effects on thermal conductivity are investigated at a system-level by assuming a single high-performance Arm core, benchmarked with the maximum power workload at ambient temperatures of 298 K and 100 K, with no changes to the design and only material thermal-conductivity changes. A simple workflow for the thermal analysis is shown in Fig. 20. The chip design along with the switching activity files for a given workload (here, Dhrystone—a maximum power workload is taken into consideration) are input to Cadence Voltus™ to generate the die-model (which contains current signatures of different components) and the power map. These are fed into Cadence® Sigrity Celcius™, an industry standard thermal analysis tool along with the thermal tech files containing material properties like metal and substrate conductivities. From this, the heat map is extracted for two temperature points as depicted in Fig. 21, showing the spread of junction temperatures as well as the peak temperature on the die. Results show a 4x reduction in the maximum temperature increase (ΔTmax) at an ambient temperature of 100 K, which is resulted from the improved Silicon-bulk thermal conductivity. At room temperature, the maximum number of cores on die is often limited and the performance is throttled to maintain the junction temperature within the thermal design power (TDP) budget. Cryogenic computing thus shows the potential to pack more cores at a system-level while adhering to the same TDP limit for RT designs.
Fig. 19. Improvement in bulk thermal conductivity of substrate silicon at cryogenic temperature showing more than 10x increase (recreated from ref.19).
Fig. 20. Flow diagram for thermal analysis starting from design database, switching activity files and material thermal property files to obtain heat map of the chip.
Fig. 21. Thermal heat map for ARES core implement in 7

MEMORIES

6T SRAM (base device—FinFETs)

The conventional 6T SRAM cell comprises of back to back inverters that form the storage unit and two NMOS devices which act as access units. If the devices are considered to be iso-IOFF similar to the processor analysis presented in Interconnects section, then the Vth of both the NMOS and the PMOS needs to be reduced at cryogenic temperature, which will reduce the slope of the inverter voltage transfer characteristics (VTC) as explained in ref.5 since the PMOS will be switched to OFF and NMOS to ON soon. The ability of a SRAM cell to hold the data designated as static noise margin (SNM) is determined by the VTC of the constituent inverters. The SNM is calculated as the side of the largest square that can be inscribed inside the butterfly curve of the SRAM20. In case the inverters are asymmetrical, then the minimum of the two will dictate the SNM.
We perform Monte Carlo analysis on 1000 SRAM cells under the applied 3σ Vth variation and plot the butterfly curves. Firstly, the cells are evaluated at 300 K and 0.7 V VDD (Fig. 22a) and since we intend to conduct this at different supply voltages, it would be appropriate to normalize the SNM to VDD. The normalized mean SNM for this case would be 0.413. The worst case (WC) scenario yielding a normalized WC-SNM of 0.399 was also taken into consideration. Next, the same analysis was conducted with the iso-IOFF cells at 77 K and 0.7 V VDD (Fig. 22b), and due to the reduced slope of the VTC of iso-IOFF inverters, the normalized mean and WC SNM was reduced to 0.287 and 0.268, respectively. While the rationale of reducing the Vth at 77 K is to operate the devices at lower supply voltage, we evaluate the SNM at 0.2 V VDD giving mean and WC SNM of 0.41 and 0.3, making it inoperable (Fig. 22c). Thus, tuning the devices to have iso-IOFF devices might not be beneficial for memories.
Fig. 22. Butterfly diagram for SRAM cells under Vth variation with 1000 Monte Carlo samples at a, 300 K, VDD = 0.7 V, b, 77 K, VDD = 0.7 V, c, 77 K VDD = 0.2 V showing degradation in WC SNM. d, proposed iso-Vth solution at 77 K, VDD = 0.3 V, compared with e, iso-IOFF at 77 K, VDD = 0.3 V and f, iso-Vth VDD = 0.7 V for scalability.
If the devices are tuned to have a constant Vth across temperature—the iso-Vth scenario mentioned in Introduction section, then the devices still show an improvement in ON current while providing the advantage of reduced IOFF. The minimum VDD at which these devices can operate is 0.3 V and the SRAM butterfly diagram for such SRAM cell is shown in Fig. 22d. The normalized mean and WC SNM are better than the 300 K, 0.7 V case at 0.465 and 0.416, respectively. To compare this with iso-IOFF, we run Monte Carlo at 0.3 V VDD on iso-IOFF devices (Fig. 22e) to obtain normalized mean and WC SNM as 0.378 and 0.337, which is less than the iso-Vth case. To verify the supply voltage scalability, iso-Vth SRAM cells were further analyzed at VDD of 0.7 V (Fig. 22f) which shows improvement (mean and WC at 0.391 and 0.375, respectively) compared to its iso-IOFF counterpart. Thus, iso-Vth devices are better for 6T SRAM.
Similar analysis was performed for read and write margins and the result is summarized in Fig. 23. For the same normalized margins (μ/VDD) for hold, the supply voltage can be reduced by 62% at 77 K compared to 300 K; similarly, for read and write by 60% and 54% thanks to increased ON current and reduced leakage at cryogenic temperature. Thus, ideally speaking the energy can be reduced by more than 50% by taking advantage of lower supply voltage swing.
Fig. 23. Supply voltage reduction for different scenarios - iso μ/VDD for hold, iso- μ/VDD for read and iso- μ/VDD for write. The devices are Vth tuned at 77 K.

2T gain cell EDRAM (base device—bulk CMOS)

With the processor core operating at higher frequency at cryogenic temperature, the memory wall increases will lead to performance bottlenecks. One way to mitigate the problem is to increase the size of the on-chip memory or the cache. However, due to limited die area, this might become infeasible and options like 2T gain cell (GC) embedded DRAM (EDRAM) which has lower area footprint (> 2x denser than 6T SRAM) tend to be viable. The operation of 2T GC-EDRAM at room temperature is constrained by the leakage currents, and the memory needs to be continuously refreshed since the charge is stored on the intrinsic gate capacitance of the transistor. However, the ultra-low leakage at cryogenic temperature will enhance the retention time by more than six orders of magnitude21, as shown by the waterfall plot in Fig. 24, and the mean retention time will be increased from 2.4 μs at 300 K to 6.5 s at 4 K. This will in turn reduce the refresh power. The maximum operating frequency of the memory array increases at low temperature, and consequently the bandwidth also shows an increasing trend at low temperature, which is mainly ascribed to the increased device ON currents and reduced interconnect resistance, while the reduced read write energies at iso-performance (Fig. 25).
Fig. 24. Retention probability of failure versus retention time for 28 nm 2T EDRAM shows 3σ mean time increases by > six orders of magnitude due to ultra-low leakage at cryogenic temperature.
Fig. 25. Array power vs bandwidth for different temperatures and refresh power/read write energies across temperature for 1 kb 2T gain cell EDRAM array in 28 nm bulk CMOS.

1T floating body RAM (base device—FDSOI)

Data is stored in FBRAM by adding charge carriers into the body of the devices through gate induced drain leakage (GIDL) current. The presence or absence of charge carriers in the body modulates the drain current of the device in the forward bias condition (Fig. 26). The retention loss dictated by the SRH (Shockley-Read-Hall) recombination/generation rate is reduced exponentially at cryogenic temperature, thus providing pseudo-static behavior with extrapolated retention times of order of 105 s at iso-current sense margin22 (Fig. 27a). Owing to higher GIDL currents, the programmability of multiple bits per cell can also be demonstrated using SiGe PMOS devices with four distinct current levels23 (Fig. 27b). With FBRAM cells having single transistor footprint (8x denser than 6T SRAM), the cache miss rate for single and double bit/cell is reduced by 57% and 66% respectively compared to 6T SRAM (Fig. 28), thus making them ideal candidates for last level of cache (LLC) in cryogenic processors.
Fig. 26. Operation principle of FBRAM—the presence or absence of charge carriers in the body modulates the Vth of the device and consequently the ON current.
Fig. 27. ΔIREAD at 77 K with Si FBRAM showing two retention states and SiGe FBRAM showing four retention states.
Fig. 28. Cache miss per 1000 instructions at 77 K at iso-silicon footprint showing 38%, 57%, and 66% reduction for 2T EDRAM, 1bit/cell FBRAM and 2 bit/cell FBRAM compared to 6T SRAM.

OPPORTUNITIES—OBSTACLES OUTLOOK

The main advantage of low temperature CMOS is that the performance scales in a predictive fashion from the understood room temperature reference points in a predictable manner with decreasing temperature. Moreover, the technology is area scalable unlike other low temperature candidates like Josephson Junctions. With ultralow leakage and higher carrier mobility leading to steep SS devices, there is a plethora of opportunities for innovation from material level (e.g., interconnects, interlayer dielectrics) to devices, circuits design techniques (e.g., keeperless domino logic, subthreshold circuits) and systems (e.g., wave pipelined, latch based etc.) which have traditionally been harder to “engineer and optimize” at room temperature. The ability to pack more chips (or chiplets) together in the given area without surmounting the TDP limit will help build compact systems.
However, there also remain some challenges. One factor that hasn't been entirely discussed here is the cooling cost. Which is the amount of wall power required to bring down to and maintain the temperature at 77 K for example. It takes a system oriented approach for analyzing the cooling efficiency in view of the fact that cooling power depends on the physical size of the system, operating wattage and other environmental factors. It is definitely more beneficial to cool larger systems than smaller systems due to the economy of scale (lower cost per unit of cooling), infrastructure complexity, maintenance and operation cost. The ideal cooling cost (Q) to cool a power (P) operating at a cryogenic temperature Tc given the nominal temperature Tnom is given by Carnot efficiency $Q=P\left(T_{\text {nom }} / T_{c}-1\right)$. However, there is a non-ideality term $\eta, 0 \leq \eta \leq 1$which will increase the cooling cost to $Q^{\prime}$, $Q^{\prime}=Q / \eta$. The following graph (Fig. 29) shows the cooling cost $\left(Q^{\prime} / P\right)$ for various values of $\eta$ (left Y-axis). The normalized power reduction obtained by supply voltage scaling across temperature is shown on the right Y-axis. The intersection of the red line with the family of curves of different $\eta$ gives the minimum efficiency required for the cryogenic system, at which we break even in terms of total power consumption.
Fig. 29. Cooling cost for various non-ideality factors and normalized power reduction obtained by supply voltage scaling across temperature.
Contrary to conventional CMOS designers who have the convenience of utilizing standard CMOS models offered by a foundry, the scenario is different when it comes to temperatures below −55 °C. Currently, there doesn't exist readily accessible commercial-grade device model for such extreme temperatures, and the development of models with cryogenic temperatures as low as 4 K or even 77 K still remains in the stage of the academic research.
A multitude of efforts have been made, including the ones presented here, to model and explain device behavior. However, more unifying theories are still needed.
Cryogenic environments pose unique thermal management challenges. Managing heat dissipation becomes crucial, as the temperature difference between the cryogenic environment and the circuit itself can lead to temperature gradients, affecting device performance and reliability. Another problem at low temperature is the increased variability in the device OFF currents, which tends to increase at advanced technology nodes as investigated in the current work. Currently, the circuit design techniques are not enabled to account for these pronounced parameter variation, and statistical analysis like Monte Carlo needs to become an inevitable part of the design cycle.

CONCLUSION

Cryogenic CMOS has immense potential for applications in high performance computing, which is brought by the improvement in devices characteristics including the increased ON current resulted from higher carrier mobility, exponentially low leakage current, steeper SS, reduced intrinsic resistances, decreased contact resistance, and so on. The BEOL interconnects show improvement as well, thanks to the lower bulk resistivity at low temperature and improved reliability due to lower joules heating. All these advantages backed with the engineering capability to tune the threshold voltage to boost the ON current will help achieve higher operating frequencies with logic devices. Furthermore, memory technologies not feasible at room temperature mainly due to leakage, such as pseudo-static gain cell EDRAM, floating body RAM, and so on, are possible at cryogenic temperature. Besides, higher thermal conductivity of bulk silicon at cryogenic temperature will allow to densely pack the chips under the given TDP limit and can be greatly exploited in 3D chip stacking and monolithic 3D integration technologies.

MISCELLANEA

Acknowledgements This work is sponsored and funded by the Defense Advanced Research Project Agency (DARPA) Low Temperature Logic Technology (LTLT) program.
Declaration of competing interest The authors declare no competing interests.
1.
Jalili M. et al. Cost-efficient overclocking in immersion-cooled datacenters. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture ( ISCA), 623-636 (IEEE, 2021). https://doi.org/10.1109/ISCA52012.2021.00055.

2.
Carlson D. M., Sullivan D. C., Bach R. E. & Resnick D. R. The ETA 10 liquidnitrogen-cooled supercomputer system. IEEE Trans. Electron Devices 36, 1404-1413 (1989). https://doi.org/10.1109/16.30952.

3.
Chiang H. L. et al. Cold CMOS as a power-performance-reliability booster for advanced FinFETs. In 2020 IEEE Symposium on VLSI Technology, 1-2 (IEEE, 2020). https://doi.org/10.1109/VLSITechnology18217.2020.9265065.

4.
Saligram R., Prasad D., Pietromonaco D., Raychowdhury A. & Cline B. A 64-bit arm CPU at cryogenic temperatures: design technology co-optimization for power and performance. In 2021 IEEE Custom Integrated Circuits Conference (CICC), 1-2 (IEEE, 2021). https://doi.org/10.1109/CICC51472.2021.9431559.

5.
Saligram R. et al. Power performance analysis of digital standard cells for 28 nm bulk CMOS at cryogenic temperature using BSIM models. IEEE J. Explor. Solid-State Comput. Devices Circuits 7, 193-200 (2021). https://doi.org/10.1109/JXCDC.2021.3131100.

6.
Beckers A., Jazaeri F. & Enz C. Cryogenic MOSFET threshold voltage model. In ESSDERC 2019-49 th European Solid-State Device Research Conference (ESSDERC), 94-97 (IEEE, 2019). https://doi.org/10.1109/ESSDERC.2019.8901806.

7.
Beckers A., Jazaeri F. & Enz C. Theoretical limit of low temperature subthreshold swing in field-effect transistors. IEEE Electron Device Lett. 41, 276-279 (2020). https://doi.org/10.1109/LED.2019.2963379.

8.
Cheema S. S. et al. Ultrathin ferroic HfO2eZrO2 superlattice gate stack for advanced transistors. Nature 604, 65-71 (2022). https://doi.org/10.1038/s41586-022-04425-6.

9.
Li W. et al. Enhancement in capacitance and transconductance in 90 nm nFETs with HfO2-ZrO 2 superlattice gate stack for energy-efficient cryo-CMOS. In 2022 International Electron Devices Meeting (IEDM), 22. 3.1-22.3.4 (IEEE, 2022). https://doi.org/10.1109/IEDM45625.2022.10019496.

10.
Gaidhane A. D. et al. Design exploration of 14 nm FinFET for energy-efficient cryogenic computing. IEEE J. Explor. Solid-State Comput. Devices Circuits 9, 108-115 (2023). https://doi.org/10.1109/JXCDC.2023.3330767.

11.
Pahwa G., Kushwaha P., Dasgupta A., Salahuddin S. & Hu C. Compact modeling of temperature effects in FDSOI and FinFET devices down to cryogenic temperatures. IEEE Trans. Electron Devices 68, 4223-4230 (2021). https://doi.org/10.1109/TED.2021.3097971.

12.
Jazaeri F., Beckers A., Tajalli A. & Sallese J.-M. A review on quantum computing: from qubits to front-end electronics and cryogenic MOSFET physics. In 2019 MIXDES-26th International Conference “Mixed Design of Integrated Circuits and Systems”, 15-25 (IEEE, 2019). https://doi.org/10.23919/MIXDES.2019.8787164.

13.
Beckers A., Jazaeri F. & Enz C. Cryogenic MOS transistor model. IEEE Trans. Electron Devices 65, 3617-3625 (2018). https://doi.org/10.1109/TED.2018.2854701.

14.
Chabane A. et al. Cryogenic characterization and modelling of 14 nm bulk FinFET technology. In ESSCIRC 2021 - IEEE 47th European Solid State Circuits Conference ( ESSCIRC), 67-70 (IEEE, 2021). https://doi.org/10.1109/ESSCIRC53450.2021.9567802.

15.
Grill A. et al. Temperature dependent mismatch and variability in a cryo-CMOS array with 30k transistors. In 2022 IEEE International Reliability Physics Symposium (IRPS), 10A.1-1-10A.1-6 (IEEE, 2022). https://doi.org/10.1109/IRPS48227.2022.9764594.

16.
Moroz V. et al. Challenges in design and modeling of cold CMOS HPC technology. In 2021 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), 107-110 (IEEE, 2021). https://doi.org/10.1109/SISPAD54002.2021.9592537.

17.
Saligram R., Datta S. & Raychowdhury A. Scaled back end of line interconnects at cryogenic temperatures. IEEE Electron Device Lett. 42, 1674-1677 (2021). https://doi.org/10.1109/LED.2021.3117277.

18.
Saligram R., Datta S. & Raychowdhury A. Design space exploration of interconnect materials for cryogenic operation: electrical and thermal analyses. IEEE Trans. Circuits Syst. I: Regul. Pap. 69, 4610-4618 (2022). https://doi.org/10.1109/TCSI.2022.3195636.

19.
Glassbrenner C. J. & Slack G. A. Thermal conductivity of silicon and germanium from 3 K to the melting point. Phys. Rev. 134, A1058 (1964). https://doi.org/10.1103/PhysRev.134.A1058.

20.
Seevinck E., List F. J. & Lohstroh J. Static-noise margin analysis of MOS SRAM cells. IEEE J. Solid-State Circuits 22, 748-754 (1987). https://doi.org/10.1109/JSSC.1987.1052809.

21.
Saligram R., Datta S. & Raychowdhury A. CryoMem: a 4 K-300 K 1.3 GHz eDRAM macro with hybrid 2T-gain-cell in a 28 nm logic process for cryogenic applications. In 2021 IEEE Custom Integrated Circuits Conference (CICC), 1-2 (IEEE, 2021). https://doi.org/10.1109/CICC51472.2021.9431527.

22.
Chakraborty W. et al. Pseudo-static 1T capacitorless DRAM using 22 nm FDSOI for cryogenic cache memory. In 2021 IEEE International Electron Devices Meeting (IEDM), 40. 1.1-40.1.4 (IEEE, 2021). https://doi.org/10.1109/IEDM19574.2021.9720578.

23.
Chakraborty W. et al. Multi-bit per-cell 1T SiGe floating body RAM for cache memory in cryogenic computing. In 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 302-303 (IEEE, 2022). https://doi.org/10.1109/VLSITechnologyandCir46769.2022.9830483.

Outlines

/