InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
An end-to-end, lossless-feel FP8 recipe that speeds LLM training for reasoning
Can we train reasoning-strong LLMs faster and cheaper—without sacrificing accuracy? InfiR2 answers with a practical, open FP8 recipe spanning continual pretraining and supervised fine-tuning. The approach uses a fine-grained, hybrid-granularity quantization strategy to preserve numerical fidelity where it matters while exploiting FP8’s efficiency where it’s safe to do so.
Across extensive experiments, including continued pretraining on a 160B-token corpus, the recipe remains remarkably stable and essentially lossless versus BF16 on a suite of reasoning evaluations. The efficiency gains are tangible: up to 22% less training time, 14% lower peak memory, and 19% higher throughput—making large-scale training more accessible without a quality trade-off.
- End-to-end FP8: covers pretraining and SFT coherently.
- Hybrid quantization: fine-grained control aligns precision with sensitivity.
- Stable and strong: parity with BF16 on reasoning benchmarks.
- Efficiency wins: faster training, lower memory, higher throughput.
By establishing FP8 as a robust, production-ready alternative to BF16—and committing to open-sourcing the code—InfiR2 lowers the barrier to innovation for teams aiming to build and iterate on reasoning-enhanced models under real-world compute budgets.
Paper: http://arxiv.org/abs/2509.22536v1
Register: https://www.AiFeta.com
#LLM #FP8 #TrainingEfficiency #Quantization #Reasoning #Scaling #DeepLearning