NVIDIA’s Grace Hopper Superchip integrates the Grace CPU and Hopper GPU, connected via the high-speed NVLink-C2C interconnect, to enhance AI and high-performance computing (HPC) capabilities. The Grace CPU, based on the Arm Neoverse architecture, offers energy efficiency and high performance, while the Hopper GPU supports advanced [[artificial intelligence (AI)]] tasks with features like the Transformer Engine for [[transformer models]]. This architecture aims to challenge traditional x86-based data center CPUs by providing a powerful, energy-efficient alternative for demanding workloads.
## Grace CPU vs. Other Processors
The NVIDIA Grace CPU represents a significant departure from traditional x86-based processors, offering several key differences that enhance performance and efficiency in data center and high-performance computing environments:
Architecture and Design
The Grace CPU is built on the Arm Neoverse V2 architecture, utilizing 72 high-performance Arm v9 64-bit cores per chip. This design choice allows for:
• Higher core count: A Grace CPU Superchip can contain up to 144 cores, providing substantial parallel processing capabilities.
• Energy efficiency: The Arm-based design delivers up to twice the performance per watt compared to conventional x86-64 platforms.
### Memory Subsystem
Grace introduces innovative memory technologies:
• LPDDR5X memory: It’s the first data center CPU to use server-class LPDDR5X memory, offering:
• Up to 50% more memory bandwidth at similar cost
• One-eighth the power consumption of typical server memory
• Up to 960 GB of physical memory with 1 TB/sec of available bandwidth in a Grace-Grace superchip configuration
• Memory topology: Grace features a simpler memory topology with only two NUMA nodes, reducing bottlenecks and simplifying application optimization.
### Interconnect and Fabric
Grace incorporates advanced interconnect technologies:
• NVLink-C2C: This chip-to-chip interconnect provides 900 GB/s of bidirectional bandwidth, which is seven times the bandwidth of PCIe 5.0 x16.
• NVIDIA Scalable Coherency Fabric (SCF): Offers 3.2 TB/s of bisection bandwidth, double that of traditional CPUs.
### Performance Characteristics
Compared to leading x86 CPUs in the same power envelope, Grace demonstrates:
• 2.3x faster performance for microservices
• 2x faster performance in memory-intensive data processing
• 1.9x faster performance in computational fluid dynamics
### Integration with GPU Acceleration
Grace is designed for an accelerated computing world:
• Optimized for NVIDIA’s APIs and seamless integration with NVIDIA GPUs
• Can be paired with NVIDIA Hopper GPUs in the Grace Hopper Superchip configuration for enhanced AI and HPC capabilities
### Power Efficiency
The Grace CPU Superchip consumes only around 500 watts while delivering high performance, making it significantly more power-efficient than traditional x86 solutions.
By leveraging these innovations, the NVIDIA Grace CPU aims to provide a powerful, energy-efficient alternative to x86-based processors for data center and HPC workloads, potentially reshaping the landscape of enterprise and high-performance computing.