Research Notes

Can AMD’s Instinct GPUs Outshine NVIDIA in MLPerf AI?

Research Finder

Find by Keyword

Can AMD’s Instinct GPUs Outshine NVIDIA in MLPerf AI?

AMD’s MLPerf debut showcases Instinct MI325X and MI300X GPUs, leveraging ROCm for competitive AI training performance in Llama 2-70B fine-tuning.

Key Highlights

  • AMD’s first MLPerf Training submission demonstrates strong performance in fine-tuning Llama 2-70B-LoRA.
  • Instinct MI325X GPUs outperform NVIDIA’s H200 by up to 8% in key AI workloads.
  • ROCm V6.5 software enhancements drive scalability and efficiency for AMD Instinct GPUs.
  • MangoBoost’s multi-node submission highlights AMD’s ecosystem strength.
  • Liquid-cooled Supermicro systems mark a first in MLPerf training runs.

The News

On June 4, 2025, AMD announced its inaugural MLPerf Training submission, showcasing the Instinct MI300 Series GPUs’ capabilities in fine-tuning the Llama 2-70B-LoRA model. The submission highlights competitive performance against NVIDIA’s H100 and H200 platforms, with the MI325X leading in specific workloads. Powered by the open-source ROCm V6.5 software stack, AMD aims to deliver scalable AI training solutions. For more details, visit AMD’s blog.

Analyst Take

AMD’s entry into the MLPerf Training v5.0 benchmarks signals a move by the company to expand their challenge to NVIDIA's entrenched dominance in AI infrastructure. The debut (focused on fine-tuning the Llama 2-70B-LoRA model) aligns with a key industry demand: enterprises want to prioritize efficient model customization instead of resource-intensive pretraining of wider foundational models. Techniques like LoRA (Low-Rank Adaptation) seek to deliver on this by adapting large language models for minimal computational overhead, a shift targeted at both cost efficiency and rapid deployment. AMD’s Instinct MI300 Series GPUs, particularly the MI325X, seek to capitalize on this trend, offering enterprises a compelling alternative for scalable AI training. The inclusion of their open-source software in ROCm V6.5, as well as a thermal management inclusion from Supermicro, demonstrate this is not just about technical performance, AMD is building an ecosystem. These moves have broader implications for an AI market, increasingly looking for alternatives to dominant players and vendor lock-in.

What Was Announced

AMD’s MLPerf Training v5.0 submission highlighted the Instinct MI325X and MI300X GPUs, designed to handle demanding AI workloads. The Llama 2-70B-LoRA benchmark involved a multi-node setup with eight MI325X GPUs, achieving near-linear scaling efficiency, as detailed in AMD’s technical blog. The MI325X (with its high memory bandwidth and advanced compute capabilities) outperformed NVIDIA’s H200 by up to 8% in fine-tuning the Llama 2-70B-LoRA model. That reflects a workload tailored for customizing large language models. The MI300X, trailing the MI325X, demonstrates good performance against NVIDIA’s H100, particularly in multi-node configurations. Both of the AMD GPUs use the company’s ROCm V6.5 software stack, which includes optimizations like Flash Attention, Transformer Engine support, and fine-tuned optimizers for training efficiency enhancement. The submission also featured the first liquid-cooled training run using Supermicro’s AS-8125GS-TNMR2 system, demonstrating thermal efficiency for high-performance computing. Additionally, AMD’s ecosystem partner, MangoBoost, submitted a multi-node training run, showcasing the scalability of Instinct hardware in distributed environments. These features collectively aim to deliver robust, scalable training for enterprise and cloud-scale AI applications.

The performance edge of the MI325X, particularly in LoRA fine-tuning, signals AMD’s focus on workloads that are becoming central to enterprise AI strategies. LoRA allows adaptation of pre-trained models for minimal compute overhead. Important where companies seek to deploy customized AI solutions without the high costs of training foundation models. Industry reports indicate that enterprises are increasingly adopting such techniques to balance performance and budget constraints.

However, AMD’s results must be considered in a wider scope. While the MI325X’s 8% performance advantage over NVIDIA’s H200 is notable, NVIDIA’s broader MLPerf Training 5.0 results, particularly with its Blackwell GB200 accelerators, demonstrate superior performance in pretraining large language models. Posts on X reflect sentiment that NVIDIA’s NVL72 system, combining 36 Grace CPUs and 72 Blackwell GPUs, remains unmatched for raw compute power in such workloads. AMD’s submission, while competitive, is narrower in scope, focusing on fine-tuning rather than the full spectrum of training tasks. This targeted approach is strategic but limits direct comparisons with NVIDIA’s comprehensive benchmark dominance.

AMD’s open-source ROCm platform has the potential to be a real differentiator. Unlike NVIDIA’s proprietary CUDA, which only supports its own hardware, ROCm’s accessibility drives for broader ecosystem collaboration, such as MangoBoost’s multi-node submission. This openness aligns with industry desires for flexible, interoperable AI stacks, a trend Deloitte has noted as critical for enterprises avoiding vendor lock-in. Yet, ROCm’s maturity lags behind CUDA, which has a decade-long head start in optimization and developer adoption across multiple industry/function tuned flavors. Sentiment on X suggests ROCm is gaining traction, with users noting its parity with CUDA in specific workloads, though broader developer adoption remains a challenge. My analysis suggests that AMD’s ability to close this software gap will be pivotal for its long-term competitiveness.

The liquid-cooled Supermicro system used in the submission is a forward-looking move. As AI workloads drive thermal boundaries, a focus on efficient cooling solutions are key. This aligns with broader industry efforts to optimize data center energy efficiency. AMD’s adoption of liquid cooling in MLPerf submissions signals its desire for leadership in sustainable AI infrastructure.

Despite these strengths, challenges remain. NVIDIA’s H200 outperformed the MI300X by 43% in prior MLPerf inference benchmarks, suggesting AMD still trails in certain workloads. Additionally, AMD’s focus on fine-tuning workloads like Llama 2-70B-LoRA, while relevant, sidesteps more compute-intensive tasks like pretraining, where NVIDIA excels. The HyperFRAME team will be closely monitoring how AMD expands its benchmark submissions to address these gaps.

Looking Ahead

AMD’s MLPerf Training debut underscores its desire to disrupt the AI GPU market, but its success hinges on scaling its ecosystem and software capabilities with continued investment and alliances. The key trend that the HyperFRAME team is tracking is AMD’s ability to demonstrate the advantages of ROCm’s open-source model in order to attract developers and enterprises who are wary of proprietary ecosystems but need a realistic alternative. Based on my analysis of the market, AMD’s focus on cost-effective fine-tuning aligns with enterprise needs, but NVIDIA’s dominance in pretraining and broader benchmark coverage remains a formidable barrier.

Going forward, the HyperFRAME team will closely monitor how AMD performs in expanding its MLPerf submissions to include more diverse workloads, such as mixture-of-experts models. This announcement signals AMD’s continued intent to carve a niche in scalable, accessible AI training. With a dominant competitor like NVIDIA, AMD’s open-source strategy and high-memory GPUs seek to position it for enterprises prioritizing flexibility and cost. That said, it will take sustained investment in ROCm and broader benchmark performance to achieve any mid- to long-term impact.

Author Information

Stephen Sopko | Analyst-in-Residence – Semiconductors & Deep Tech

Stephen Sopko is an Analyst-in-Residence specializing in semiconductors and the deep technologies powering today’s innovation ecosystem. With decades of executive experience spanning Fortune 100, government, and startups, he provides actionable insights by connecting market trends and cutting-edge technologies to business outcomes.

Stephen’s expertise in analyzing the entire buyer’s journey, from technology acquisition to implementation, was refined during his tenure as co-founder and COO of Palisade Compliance, where he helped Fortune 500 clients optimize technology investments. His ability to identify opportunities at the intersection of semiconductors, emerging technologies, and enterprise needs makes him a sought-after advisor to stakeholders navigating complex decisions.