Summary:
1. Oracle and AMD have joined forces to offer high-performance AI infrastructure on Oracle Cloud Infrastructure, featuring AMD’s latest Instinct MI355X GPUs.
2. The collaboration will enable zettascale AI clusters with up to 131,072 GPUs, catering to enterprises developing complex AI models.
3. The partnership introduces technical innovations like AMD’s MI355X GPUs with enhanced computing performance, supported by AMD Turin CPUs and ROCm software stack.
Title: Oracle and AMD Team Up to Revolutionize AI Infrastructure on Oracle Cloud
In a groundbreaking partnership, Oracle and AMD have combined their expertise to revolutionize AI infrastructure on Oracle Cloud. This collaboration introduces cutting-edge technology that will propel AI development to new heights, meeting the demands of enterprises operating high-intensity AI workloads.
Oracle Cloud Infrastructure (OCI) will now host zettascale AI clusters powered by AMD’s latest Instinct MI355X GPUs. These GPUs are designed to significantly enhance the price-performance ratio for AI training and inference tasks, making them ideal for organizations working with large language models and emerging agentic AI applications.
Mahesh Thiagarajan, Executive Vice President of Oracle Cloud Infrastructure, underscored Oracle’s commitment to expanding its AI infrastructure to serve customers effectively. By integrating AMD’s Instinct GPUs with OCI’s advanced capabilities, customers can expect unparalleled performance, security, and scalability for their AI projects.
The collaboration also showcases AMD’s architectural advancements, with the MI355X GPUs offering nearly three times the computing performance of their predecessors and a 50% increase in high-bandwidth memory capacity. This allows organizations to train and deploy larger AI models efficiently, with reduced latency.
Furthermore, the new infrastructure boasts technical innovations like the dense, liquid-cooled design supporting 64 GPUs per rack, delivering up to 125kW per rack. This design emphasizes high throughput and lower time-to-first-token, crucial for real-time AI applications.
OCI’s AI platform will benefit from a high-performance head node powered by AMD Turin CPUs, offering up to 3TB of system memory for enhanced orchestration and data handling. This ensures optimal GPU resource utilization across large-scale deployments, maximizing efficiency.
The partnership also emphasizes Oracle’s commitment to open-source software, with AMD’s ROCm software stack enabling seamless migration of existing AI code to OCI’s infrastructure. By supporting widely adopted AI frameworks and libraries, developers can accelerate development cycles and improve accessibility.
In addition, AMD’s Pollara AI NICs enhance network architecture, enabling advanced RoCE capabilities to reduce network latency and increase throughput for hyperscale AI workloads. This integration positions Oracle as a leader in leveraging open industry standards for networking innovations.
The launch of MI355X GPUs on OCI is slated for the fall of 2025, marking a significant milestone in the evolution of AI infrastructure. As AI adoption continues to surge across industries, Oracle and AMD’s collaboration sets a new standard for cloud-based, high-performance computing solutions that will drive innovation and accelerate AI development.