NVIDIA GH200 Grace Hopper™ Superchip

  • The NVIDIA GH200 Grace Hopper™ Superchip is a breakthrough design with a high-bandwidth connection between the Grace CPU and Hopper GPU to enable the era of accelerated computing and generative AI.

10x higher performance

The NVIDIA GH200 Grace Hopper™ Superchip delivers up to 10x higher performance for applications running terabytes of data, enabling scientists and researchers to reach unprecedented solutions for the world’s most complex problems.

image

Specifications

  • 72 Cores Grace CPU Cores
  • Up to 500 GB/s CPU LPDDR5X bandwidth
  • 4TB/s HBM3 GPU HBM bandwidth
  • 96GB HBM3 GPU HBM capacity
  • 900GB/s total, 450GB/s per direction NVLink-C2C bandwidth
  • 480GB CPU LPDDR5X capacity

The most efficient large memory supercomputer

Designed for AI training, inference, and HPC

The NVIDIA GH200 empowers businesses to foster innovation and unearth new value by enhancing large language model training and inference. It further amplifies recommender systems through expanded fast-access memory, and facilitates deeper insights via advanced graph neural network analysis.

The power of coherent memory

The NVIDIA NVLink-C2C interconnect provides 900GB/s of bidirectional bandwidth between CPU and GPU for 7x the performance found in accelerated systems. The connection provides unified cache coherence with a single memory address space that combines system and HBM GPU memory for simplified programmability.

Performance and speed

GH200 will deliver up to 10x higher performance for applications running terabytes of data, helping scientists and researchers reach unprecedented solutions for the world's most complex problems.

Pricing

Starting at

$2.99 - Per hour

  • Starting at $2.990 / hour with 36-month contract, 730 / hour / month
  • Key features

  • Combines the performance of NVIDIA Hopper™ GPU with the versatility of NVIDIA Grace™ CPU in a single superchip.
  • Features high-bandwidth, memory-coherent NVIDIA® NVLink® Chip-2-Chip (C2C) interconnect.
  • Accelerates large-scale AI and HPC workloads using both GPUs and CPUs.