Nvidia Bakes Liquid Cooling into PCIe GPU Cards

2022-09-16 20:41:49 By : Ms. Laurel Zhang

Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them

Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them

Nvidia is bringing liquid cooling, which it typically puts alongside GPUs on the high-performance computing systems, to its mainstream server GPU portfolio.

The company will start shipping its A100 PCIe Liquid Cooled GPU, which is based on the Ampere architecture, for servers later this year. The liquid-cooled GPU based on the company’s new Hopper architecture for PCIe slots will ship early next year.

The new A100 PCIe GPU has the same hardware as the conventional air-cooled A100 PCIe version, but it includes direct-to-chip liquid cooling. The GPU fits into a single PCIe slot, and is half the width of the air-cooled version, which requires a dual slot.

Like the air-cooled version, the new liquid-cooled form factor has 80GB of GPU memory, memory bandwidth at 2 terabytes per second and maintains a TDP of 350 watts. It offers double precision floating point performance of 9.7 teraflops, and single-precision floating point performance – which is more relevant to AI calculations – of 19.5 teraflops.

About “40 percent of the energy used by datacenters goes to cooling and one of the directions the industry is moving [toward] for energy efficient cooling is liquid,” said Paresh Kharya, said senior director of product management for accelerated computing at Nvidia, during a press briefing.

Kharya said datacenters consume 1 percent of the world’s electricity, and that energy efficiency was a big consideration for the company when introducing products. He said GPUs are more power efficient than CPUs for workloads such as AI that need acceleration.

“Even the mainstream enterprise datacenters are looking at options for liquid cooling in datacenter infrastructures,” Kharya said.

But GPUs in general draw a lot of power, and instead of focusing on power efficiency at the chip level, Nvidia is putting liquid cooling on top of the GPU.

Switching from air-cooled A100 PCIe to liquid-cooled A100 PCIe configurations can reduce rack space up to 66 percent and lower power consumption by 30 percent, an Nvidia spokeswoman said.

Rack space is saved by obviating the need for a heatsink, enabling the A100 GPUs with liquid cooling to use just one PCIe slot, where air-cooled GPUs fill two, Kharya said.

The ability to pack more GPUs at the rack level and a better cooling mechanism helps servers run more workloads and achieve a more sustained output, said Kevin Krewell, principal analyst at Tirias Research.

“Water cooling is more efficient transferring heat away from the card. It also allows for a single slot card, while air cooled are double width. There is more plumbing required, but there are a number of rack solutions,” Krewell said.

About a dozen server makers, including Asus, Foxconn, Gigabyte, Inventec and Supermicro, plan to ship liquid-cooled PCIe GPU systems later this year.

Nvidia also said its liquid-cooled server based on the Hopper architecture, called HGX H100, will ship in the fourth quarter this year.

“Our HGX A100 with liquid-cooled options using the A100 SXM form factor have been in production from Nvidia partners for some time. The newly announced A100 PCIe Liquid Cooled GPU is sampling now and partners will be offering qualified mainstream servers later this year,” an Nvidia spokeswoman said.

The chipmaker, which is largely known for its GPUs, is now positioning itself as a one-stop-shop for AI products that include hardware, software and services. The company is offering software suites and programming frameworks targeted at verticals such as automotive, manufacturing, health care, security and robotics. The software is designed to work the fastest on its homegrown GPUs and servers.

Alternately, Nvidia is offering what it calls “AI factories” where its homegrown supercomputers solve AI problems and spit out customized products for companies.

Nvidia announced the liquid-cooled GPUs at the Computex trade show being held in Taipei. The company also announced that hardware companies will start shipping systems based on its Arm-based Grace CPU early next year.

Full-sized feature graphic: Nvidia’s liquid-cooled PCIe GPU card

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Jetstream2, a collaborative cloud HPC system spread across five institutions, has entered full production following an early operations phase that began earlier this year. The computer — which now serves a broad resear Read more…

In this regular feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

Two and a half years later, much of the world has settled into an uneasy routine with Covid-19 thanks to a host of highly effective vaccines and a handful of effective therapeutic drugs. Variants loom over this tenuous p Read more…

When DeepMind, an Alphabet subsidiary, started off more than a decade ago, solving some most pressing research questions and problems with AI wasn’t at the top of the company’s mind. Instead, the company started off AI research with computer games. Every score and win was a measuring stick of success... Read more…

Well-heeled Google spinout SandboxAQ announced its first acquisition today – snapping up cybersecurity software specialist Cryptosense – while quantum networking startup Aliro Quantum added a CMO to ramp up commercia Read more…

Contributions from Blythe Walker , CTO RenaissanceRe, Colum Thorne, VP Platform Architecture RenaissanceRe, Eoin Shanaghy – CTO fourTheorem, Luciano Mammino, Senior Cloud Architect fourTheorem, Matthew Meckes, Senior Serverless Specialist AWS

RenaissanceRe is one of the world’s leading reinsurance companies, consistently recognized for our innovation, technical excellence, and creative problem-solving. Read more…

Insurance is a highly regulated industry that is evolving as the industry faces changing customer expectations, massive amounts of data, and increased regulations. A major issue facing the industry is tracking insurance fraud. Read more…

Chipmaker Cerebras is patching its chips – already considered the world's largest – to create what could be the largest-ever computing cluster for AI computing. A reasonably sized "wafer-scale cluster," as Cerebras calls it, can network together 16 CS-2s into a cluster to create a computing system with 13.6 million cores for natural... Read more…

Jetstream2, a collaborative cloud HPC system spread across five institutions, has entered full production following an early operations phase that began earlier Read more…

When DeepMind, an Alphabet subsidiary, started off more than a decade ago, solving some most pressing research questions and problems with AI wasn’t at the top of the company’s mind. Instead, the company started off AI research with computer games. Every score and win was a measuring stick of success... Read more…

Well-heeled Google spinout SandboxAQ announced its first acquisition today – snapping up cybersecurity software specialist Cryptosense – while quantum netwo Read more…

Chipmaker Cerebras is patching its chips – already considered the world's largest – to create what could be the largest-ever computing cluster for AI computing. A reasonably sized "wafer-scale cluster," as Cerebras calls it, can network together 16 CS-2s into a cluster to create a computing system with 13.6 million cores for natural... Read more…

SambaNova Systems is announcing – and shipping – its second-generation DataScale system, the DataScale SN30. Powered by the eponymous Cardinal SN30 RDU (Rec Read more…

The wheels are turning on the so-called CHIPS and Science Act, with a flurry of activity this week to turn the legislation into action. On Friday, U.S. President Joe Biden will be in Ohio alongside Intel CEO Pat Gelsinger to break ground on the chipmaker's new $20 billion manufacturing site, which will likely be partially... Read more…

The steady maturation of MLCommons/MLPerf as an AI benchmarking tool was apparent in today’s release of MLPerf v2.1 Inference results. Twenty-one organization Read more…

CATALOG, which was founded in 2016 by MIT researchers, pitches itself as “building the world’s first DNA-based platform for massive digital data storage and Read more…

It is perhaps not surprising that the big cloud providers – a poor term really – have jumped into quantum computing. Amazon, Microsoft Azure, Google, and th Read more…

In April 2018, the U.S. Department of Energy announced plans to procure a trio of exascale supercomputers at a total cost of up to $1.8 billion dollars. Over the ensuing four years, many announcements were made, many deadlines were missed, and a pandemic threw the world into disarray. Now, at long last, HPE and Oak Ridge National Laboratory (ORNL) have announced that the first of those... Read more…

The U.S. Senate on Tuesday passed a major hurdle that will open up close to $52 billion in grants for the semiconductor industry to boost manufacturing, supply chain and research and development. U.S. senators voted 64-34 in favor of advancing the CHIPS Act, which sets the stage for the final consideration... Read more…

The 59th installment of the Top500 list, issued today from ISC 2022 in Hamburg, Germany, officially marks a new era in supercomputing with the debut of the first-ever exascale system on the list. Frontier, deployed at the Department of Energy’s Oak Ridge National Laboratory, achieved 1.102 exaflops in its fastest High Performance Linpack run, which was completed... Read more…

The first-ever appearance of a previously undetectable quantum excitation known as the axial Higgs mode – exciting in its own right – also holds promise for developing and manipulating higher temperature quantum materials... Read more…

Amid the high-performance GPU turf tussle between AMD and Nvidia (and soon, Intel), a new, China-based player is emerging: Biren Technology, founded in 2019 and headquartered in Shanghai. At Hot Chips 34, Biren co-founder and president Lingjie Xu and Biren CTO Mike Hong took the (virtual) stage to detail the company’s inaugural product: the Biren BR100 general-purpose GPU (GPGPU). “It is my honor to present... Read more…

Additional details of the architecture of the exascale El Capitan supercomputer were disclosed today by Lawrence Livermore National Laboratory’s (LLNL) Terri Read more…

Tesla has revealed that its biggest in-house AI supercomputer – which we wrote about last year – now has a total of 7,360 A100 GPUs, a nearly 28 percent uplift from its previous total of 5,760 GPUs. That’s enough GPU oomph for a top seven spot on the Top500, although the tech company best known for its electric vehicles has not publicly benchmarked the system. If it had, it would... Read more…

HPCwire takes you inside the Frontier datacenter at DOE's Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tenn., for an interview with Frontier Project Direc Read more…

AMD is getting personal with chips as it sets sail to make products more to the liking of its customers. The chipmaker detailed a modular chip future in which customers can mix and match non-AMD processors in a custom chip package. "We are focused on making it easier to implement chips with more flexibility," said Mark Papermaster, chief technology officer at AMD during the analyst day meeting late last week. Read more…

Intel reiterated it is well on its way to merging its roadmap of high-performance CPUs and GPUs as it shifts over to newer manufacturing processes and packaging technologies in the coming years. The company is merging the CPU and GPU lineups into a chip (codenamed Falcon Shores) which Intel has dubbed an XPU. Falcon Shores... Read more…

The long-troubled, hotly anticipated MareNostrum 5 supercomputer finally has a vendor: Atos, which will be supplying a system that includes both Nvidia and Inte Read more…

The Universal Chiplet Interconnect Express (UCIe) consortium is moving ahead with its effort to standardize a universal interconnect at the package level. The c Read more…

Fusion, the nuclear reaction that powers the Sun and the stars, has incredible potential as a source of safe, carbon-free and essentially limitless energy. But Read more…

Just a couple of weeks ago, the Indian government promised that it had five HPC systems in the final stages of installation and would launch nine new supercomputers this year. Now, it appears to be making good on that promise: the country’s National Supercomputing Mission (NSM) has announced the deployment of “PARAM Ganga” petascale supercomputer at Indian Institute of Technology (IIT)... Read more…

You may recall that efforts proposed in 2020 to remake the National Science Foundation (Endless Frontier Act) have since expanded and morphed into two gigantic bills, the America COMPETES Act in the U.S. House of Representatives and the U.S. Innovation and Competition Act in the U.S. Senate. So far, efforts to reconcile the two pieces of legislation have snagged and recent reports... Read more…

© 2022 HPCwire. All Rights Reserved. A Tabor Communications Publication

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.