NVIDIA’s Grace Hopper Superchip integrates the Grace CPU and Hopper GPU, connected via the high-speed NVLink-C2C interconnect, to enhance AI and high-performance computing (HPC) capabilities. The Grace CPU, based on the Arm Neoverse architecture, offers energy efficiency and high performance, while the Hopper GPU supports advanced [[artificial intelligence (AI)]] tasks with features like the Transformer Engine for [[transformer models]]. This architecture aims to challenge traditional x86-based data center CPUs by providing a powerful, energy-efficient alternative for demanding workloads. ## Grace CPU vs. Other Processors The NVIDIA Grace CPU represents a significant departure from traditional x86-based processors, offering several key differences that enhance performance and efficiency in data center and high-performance computing environments: Architecture and Design The Grace CPU is built on the Arm Neoverse V2 architecture, utilizing 72 high-performance Arm v9 64-bit cores per chip. This design choice allows for: • Higher core count: A Grace CPU Superchip can contain up to 144 cores, providing substantial parallel processing capabilities. • Energy efficiency: The Arm-based design delivers up to twice the performance per watt compared to conventional x86-64 platforms. ### Memory Subsystem Grace introduces innovative memory technologies: • LPDDR5X memory: It’s the first data center CPU to use server-class LPDDR5X memory, offering: • Up to 50% more memory bandwidth at similar cost • One-eighth the power consumption of typical server memory • Up to 960 GB of physical memory with 1 TB/sec of available bandwidth in a Grace-Grace superchip configuration • Memory topology: Grace features a simpler memory topology with only two NUMA nodes, reducing bottlenecks and simplifying application optimization. ### Interconnect and Fabric Grace incorporates advanced interconnect technologies: • NVLink-C2C: This chip-to-chip interconnect provides 900 GB/s of bidirectional bandwidth, which is seven times the bandwidth of PCIe 5.0 x16. • NVIDIA Scalable Coherency Fabric (SCF): Offers 3.2 TB/s of bisection bandwidth, double that of traditional CPUs. ### Performance Characteristics Compared to leading x86 CPUs in the same power envelope, Grace demonstrates: • 2.3x faster performance for microservices • 2x faster performance in memory-intensive data processing • 1.9x faster performance in computational fluid dynamics ### Integration with GPU Acceleration Grace is designed for an accelerated computing world: • Optimized for NVIDIA’s APIs and seamless integration with NVIDIA GPUs • Can be paired with NVIDIA Hopper GPUs in the Grace Hopper Superchip configuration for enhanced AI and HPC capabilities ### Power Efficiency The Grace CPU Superchip consumes only around 500 watts while delivering high performance, making it significantly more power-efficient than traditional x86 solutions. By leveraging these innovations, the NVIDIA Grace CPU aims to provide a powerful, energy-efficient alternative to x86-based processors for data center and HPC workloads, potentially reshaping the landscape of enterprise and high-performance computing.