Last week, at COMPUTEX, Nvidia unveiled its latest technological marvel, the GH200 Grace Hopper “Superchip.” This powerhouse is a fusion of CPU and GPU components, meticulously designed for the demands of large-scale AI applications and has now entered full-scale production. It’s a formidable specimen, equipped with a staggering 528 GPU tensor cores, boasting support for up to 480GB of CPU RAM and 96GB of GPU RAM, and flaunting a GPU memory bandwidth that reaches an impressive 4TB per second.
Previously, we delved into Nvidia’s H100 Hopper chip, currently the reigning champion of their data center GPU lineup. This chip powers the likes of AI models such as OpenAI’s ChatGPT and represents a significant leap forward from the 2020 A100 chip, which fueled the early training stages of many prominent generative AI chatbots and image generators that dominate the headlines today.
The correlation is straightforward: faster GPUs translate to more potent generative AI models. They excel at executing parallel matrix multiplications, a crucial operation for today’s artificial neural networks.
The GH200 builds upon the foundation laid by the “Hopper” lineage and harmoniously combines it with Nvidia’s “Grace” CPU platform, both paying homage to the computing pioneer Grace Hopper. This amalgamation is made possible through Nvidia’s NVLink chip-to-chip (C2C) interconnect technology. The anticipated outcome is a significant acceleration of AI and machine-learning applications across both training (model creation) and inference (model execution).
Ian Buck, Nvidia’s Vice President of Accelerated Computing, emphasized the transformative impact of generative AI across various industries. He stated, “Generative AI is rapidly transforming businesses, unlocking new opportunities and accelerating discovery in healthcare, finance, business services, and many more industries.” With Grace Hopper Superchips in full production, manufacturers worldwide are poised to provide the accelerated infrastructure necessary for enterprises to harness the potential of generative AI applications, leveraging their unique proprietary data.
The GH200 introduces several key features, including a groundbreaking 900GB/s coherent (shared) memory interface, a colossal sevenfold improvement over PCIe Gen5. Additionally, it offers a remarkable 30-fold increase in aggregate system memory bandwidth to the GPU compared to the previously mentioned Nvidia DGX A100. Notably, the GH200 is compatible with all of Nvidia’s software platforms, encompassing the Nvidia HPC SDK, Nvidia AI, and Nvidia Omniverse.
In a strategic move, Nvidia also revealed its intention to incorporate this CPU/GPU hybrid into a new supercomputer called the DGX GH200. This formidable machine harnesses the collective power of 256 GH200 chips to function as a singular GPU entity, delivering an astounding 1 exaflop of performance and a shared memory pool of 144 terabytes—an unprecedented leap with nearly 500 times the memory capacity of the previous-generation Nvidia DGX A100.
The DGX GH200’s capabilities extend to training colossal next-generation AI models, potentially paving the way for a GPT-6 or similar advancements in generative language applications, recommender systems, and data analytics. Although Nvidia has not disclosed pricing details for the GH200, industry insights suggest that a single DGX GH200 computer is likely to cost a substantial sum, possibly reaching the low eight digits.
In summary, the relentless pursuit of hardware innovation by industry leaders like Nvidia and Cerebras promises a future where high-end cloud AI models continually evolve, processing vast amounts of data at unprecedented speeds. We can only hope that these impressive advancements don’t lead to contentious debates with tech journalists.