InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

An end-to-end, lossless-feel FP8 recipe that speeds LLM training for reasoning

Can we train reasoning-strong LLMs faster and cheaper—without sacrificing accuracy? InfiR2 answers with a practical, open FP8 recipe spanning continual pretraining and supervised fine-tuning. The approach uses a fine-grained, hybrid-granularity quantization strategy to preserve numerical fidelity where it matters while exploiting FP8’s efficiency where it’s safe to do so.

Across extensive experiments, including continued pretraining on a 160B-token corpus, the recipe remains remarkably stable and essentially lossless versus BF16 on a suite of reasoning evaluations. The efficiency gains are tangible: up to 22% less training time, 14% lower peak memory, and 19% higher throughput—making large-scale training more accessible without a quality trade-off.

  • End-to-end FP8: covers pretraining and SFT coherently.
  • Hybrid quantization: fine-grained control aligns precision with sensitivity.
  • Stable and strong: parity with BF16 on reasoning benchmarks.
  • Efficiency wins: faster training, lower memory, higher throughput.

By establishing FP8 as a robust, production-ready alternative to BF16—and committing to open-sourcing the code—InfiR2 lowers the barrier to innovation for teams aiming to build and iterate on reasoning-enhanced models under real-world compute budgets.

Paper: http://arxiv.org/abs/2509.22536v1
Register: https://www.AiFeta.com

#LLM #FP8 #TrainingEfficiency #Quantization #Reasoning #Scaling #DeepLearning

Read more