Blog Summary:
1. Meta and Oracle are upgrading their AI data centres with NVIDIA’s Spectrum-X Ethernet networking switches to handle the demands of large-scale AI systems.
2. Jensen Huang, NVIDIA’s CEO, emphasized the importance of trillion-parameter models in transforming data centres into giga-scale AI factories connected by Spectrum-X.
3. The integration of Spectrum-X into Meta’s FBOSS and Oracle’s Vera Rubin architecture will improve AI training efficiency and scalability across massive compute clusters.
Title: Enhancing AI Data Centres with NVIDIA’s Spectrum-X Ethernet Networking Switches
In a bid to cater to the ever-growing demands of large-scale AI systems, Meta and Oracle have decided to upgrade their AI data centres with NVIDIA’s cutting-edge Spectrum-X Ethernet networking switches. This move comes as part of an open networking framework aimed at enhancing AI training efficiency and accelerating deployment across massive compute clusters.
Jensen Huang, the visionary founder and CEO of NVIDIA, highlighted the pivotal role of trillion-parameter models in reshaping data centres into giga-scale AI factories. He emphasized that Spectrum-X serves as the vital “nervous system,” seamlessly connecting millions of GPUs to train the largest models ever constructed.
Oracle is set to leverage Spectrum-X Ethernet alongside its Vera Rubin architecture to establish large-scale AI factories. Mahesh Thiagarajan, the Executive Vice President of Oracle Cloud Infrastructure, expressed excitement about the enhanced setup, which will enable more efficient connectivity of millions of GPUs. This advancement will facilitate customers in training and deploying new AI models at an accelerated pace.
Concurrently, Meta is expanding its AI infrastructure by integrating Spectrum-X Ethernet switches into its proprietary Facebook Open Switching System (FBOSS). Gaya Nagarajan, Meta’s Vice President of Networking Engineering, stressed the necessity for an open and efficient next-generation network to support increasingly larger AI models and deliver services to billions of users seamlessly.
Furthermore, NVIDIA’s focus on building flexible AI systems is evident through the modular design of the MGX system. Joe DeLaere, the leader of NVIDIA’s Accelerated Computing Solution Portfolio for Data Centre, emphasized the importance of flexibility as data centres become more intricate. The MGX system’s modular approach allows partners to combine different CPUs, GPUs, storage, and networking components as required, promoting interoperability and future readiness across multiple hardware generations.
As the scale of AI models continues to expand, power efficiency emerges as a critical challenge for data centres. DeLaere highlighted NVIDIA’s comprehensive approach, working from chip to grid to enhance energy efficiency and scalability. Collaborations with power and cooling vendors aim to maximize performance per watt, with innovations like 800-volt DC power delivery and power-smoothing technology contributing to significant reductions in power needs.
NVIDIA’s MGX system plays a pivotal role in scaling data centres, with Gilad Shainer, the company’s Senior Vice President of Networking, underscoring its capability to connect compute and switching components within MGX racks. This design supports NVLink for scale-up connectivity and Spectrum-X Ethernet for scale-out growth, enabling the connection of multiple AI data centres into a unified system. The ability to link sites through dark fibre or additional MGX-based switches facilitates high-speed connections across regions, essential for Meta’s distributed AI training operations.
The collaboration between NVIDIA and industry leaders like Cisco, xAI, Meta, and Oracle Cloud Infrastructure is driving the expansion of the AI ecosystem. Spectrum-X Ethernet, purpose-built for AI workloads, offers unparalleled efficiency and scalability, outperforming traditional Ethernet in handling training and inference workloads. By partnering with key players, NVIDIA aims to make Spectrum-X accessible across various environments, from hyperscalers to enterprises.
Looking ahead, NVIDIA’s upcoming Vera Rubin architecture, set to debut in the second half of 2026, will work in tandem with Spectrum-X networking and MGX systems to support the next generation of AI factories. DeLaere highlighted the distinction between Spectrum-X and XGS, which share core hardware but utilize different algorithms for varying distances, ensuring minimal latency and enabling multiple sites to function collectively as a massive AI supercomputer.
In preparation for the transition to 800-volt DC power delivery, NVIDIA is collaborating extensively across the power chain, from chip level to grid. Partnerships with industry leaders like Onsemi, Infineon, Delta, Flex, Lite-On, Schneider Electric, and Siemens underscore a holistic design approach aimed at seamless integration within high-density AI environments operated by companies like Meta and Oracle.
The performance advantages offered by Spectrum-X Ethernet are tailored for hyperscalers, with features like adaptive routing and telemetry-based congestion control eliminating network hotspots and ensuring stable performance. This scalability and performance optimization support higher training and inference speeds, enabling multiple workloads to run concurrently without interference, crucial for managing the surging AI training demands effectively.
While NVIDIA’s hardware innovations are widely acclaimed, DeLaere emphasized the significance of software optimization in maximizing AI system efficiency. The company’s commitment to co-designing hardware and software, coupled with investments in FP4 kernels, frameworks like Dynamo and TensorRT-LLM, and algorithms such as speculative decoding, underscores a continuous drive for improved throughput and AI model performance. These enhancements are crucial for delivering consistent AI performance over time, critical for hyperscalers like Meta reliant on reliable AI systems.
The Spectrum-X platform, encompassing Ethernet switches and SuperNICs, marks NVIDIA’s pioneering foray into purpose-built Ethernet systems for AI workloads. This platform, designed to efficiently connect millions of GPUs while maintaining predictable performance across AI data centres, offers a significant advancement over conventional Ethernet technology. With cutting-edge features like up to 95% data throughput and support for long-distance AI data centre links through XGS technology, Spectrum-X is poised to revolutionize AI infrastructure, paving the way for trillion-parameter models and the next wave of generative AI workloads.
In conclusion, NVIDIA’s collaboration with industry leaders, coupled with the groundbreaking advancements in Spectrum-X Ethernet technology, heralds a new era in AI infrastructure efficiency and scalability. The seamless integration of hardware and software, coupled with a focus on power efficiency and performance optimization, underscores NVIDIA’s commitment to delivering high-performance AI solutions for hyperscalers and enterprises alike. With the impending commercial availability of Vera Rubin architecture and the continued evolution of Spectrum-X technology, the future of AI data centres appears poised for unprecedented growth and innovation.