All‑AMD AI Training Goes Big: ZAYA1 and MoE at Scale

All‑AMD AI Training Goes Big: ZAYA1 and MoE at Scale

All‑AMD AI training, proven at scale

Researchers ran the first large-scale mixture‑of‑experts (MoE) pretraining entirely on AMD hardware—MI300X GPUs connected via Pollara—and distilled practical playbooks for both systems and model design.

  • Systems: Full-cluster networking benchmarks for all-reduce, reduce-scatter, all-gather, and broadcast across message sizes and GPU counts; MI300X kernel sizing and memory-bandwidth insights; a production-ready training stack with fault tolerance and checkpoint reshaping.
  • Modeling: MI300X‑aware sizing rules for attention and MLP blocks, plus MoE width choices that balance training throughput with inference latency.
  • Results: Introduces ZAYA1 (760M active, 8.3B total parameters, MoE). At this scale it matches Qwen3‑4B and Gemma3‑12B, and outperforms Llama‑3‑8B and OLMoE on reasoning, math, and coding benchmarks.

Takeaway: AMD’s compute, network, and software stack are mature and optimized enough for competitive large‑scale pretraining—expanding hardware choice for frontier AI.

Paper: https://arxiv.org/abs/2511.17127v1

Paper: https://arxiv.org/abs/2511.17127v1

Register: https://www.AiFeta.com

#AI #AMD #MI300X #Pollara #MoE #LLM #FoundationModels #HPC #Networking #Research

Read more