Nvidia’s vision for datacenter cooling asks the question: given the choice between liquid and immersion cooling, why should you have to choose?
This conceptwhich involves a combination of direct liquid cooling (DLC) to the silicon and immersion cooling for the rest of the components, is the subject of a $5 million grant from the US Department of Energy (DoE) under its COOLERCHIPS program. The program aims to reduce the amount of power expended on datacenter cooling to less than 5 percent of the energy expended on IT itself.
Nvidia’s server cooling concept bears little resemblance to the tank-style immersion cooling setups we’ve seen from the likes of Submer or LiquidStack. It retains the standard rack-mount form factor used in most air and direct liquid-cooled chassis.
At first blush, the concept looks a lot like one from another immersion cooling vendor: Iceotope. We looked at Iceotope’s designs back at Supercomputing 2022. The company offers a rack-mounted chassis in which a device’s motherboard sits beneath a thin layer of coolant. The chassis pumps coolant directly to the CPU, GPU, memory, and even hard disks.
This approach has a couple of advantages over traditional direct liquid cooling designs. One of the big ones is it completely eliminates the need for fans to cool low-wattage components. While you can liquid cool CPUs and GPUs fairly easily, memory, NICs, and storage are trickier. Submerging the motherboard in coolant largely eliminates the need for fans.
Nvidia’s liquid cooling concept calls for a combination of liquid and immersion cooling tech – Click to enlarge
Nvidia’s concept calls for phase-change refrigerants – similar to the substances used in fridges and air conditioners – rather than the single-phase fluids used in Iceotope’s designs. As the motherboard heats up, the liquid will essentially boil, condense and then drip back down. However, Nvidia’s concept also calls for traditional direct liquid cooling for the CPUs and GPUs.
In theory, this should allow Nvidia to achieve dual temperature zones: one for high-thermal design power (TDP) components like CPUS and GPUs, and one for cooler components like memory or network cards.
Nvidia’s interest in liquid cooling isn’t surprising because its products have produced more heat as they evolved. Today, the chip shop’s SXM GPUs top out at 700W, while its Grace-Hopper Superchips – which combine an H100 with a 72-core Grace CPU and 512GB of LPDDR5 – are rated for 1kW a piece.
“Soon today’s air-cooled systems won’t be able to keep up. Current liquid-cooled technologies wont be able to handle the more than 40 watts per square centimeter researchers expect future silicon in datacenters will need to dissipate,” Nvidia’s post explains.
Nvidia is no stranger to liquid cooling. The accelerator giant has offered liquid cooled form factors for its SXM GPU modules for a few years. However, it’s only recently inrtoduced liquid cooling to the rest of its lineup. It took until the Computex conference in May 2022 to start offering its popular A100 PCIe cards in a DLC form factor with plans for a liquid cooled H100 starting in 2023.
Nvidia plans to deliver a test system combining liquid and immersion cooling in 2026, and has promised updates on its progress towards that goal as soon as possible.
According to the blog post, during the first year engineers will focus on component testing before moving on to partial rack tests in 2025. For this, Nvidia has tapped liquid cooling specialist outfit BOYD Corp to help with cold plates; two-phase cooling champ Durbin Group to work on pumps; Honeywell for the refrigerant; and datacenter infrastructure player Vertiv for heat rejection.
Nvidia will also work with Binghamton and Villanova universities for analysis, testing, and analysis and Sandia National Labs to evaluate the concept’s reliability.
Liquid and immersion cooling R&D takes off
Nvidia isn’t the only player working on datacenter cooling tech. Intel recently shared progress on its efforts to develop a variety of systems capable of dissipating kilowatts of heat from a single chip.
Many of these designs focus on similar concepts – like submerging whole systems in vats of dielectric fluids – but also explore the use of advanced manufacturing to embed 3D vapor chambers in “coral-shaped heat sinks”. Tiny jets that shoot cool water over hot spots on chips are another idea.
Despite continued work in this arena, Intel’s financial woes have led to some roadblocks, including the cancellation of a $700 million liquid and immersion cooling “mega-lab” in Oregon. ®