InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

An end-to-end FP8 recipe that’s stable, lossless vs BF16, and measurably faster for reasoning LLMs.

Training frontier LLMs is costly. FP8 promises efficiency, but real-world adoption has lacked a robust, open recipe. InfiR2 fills that gap with an end-to-end FP8 training methodology spanning continual pretraining and supervised fine-tuning, underpinned by a fine-grained, hybrid-granularity quantization strategy that preserves numerical fidelity.

Across extensive experiments—including continued pretraining on a 160B-token corpus—InfiR2 matches BF16 baselines on a suite of reasoning benchmarks while delivering notable efficiency gains: up to 22% reduction in training time, 14% lower peak memory, and 19% higher throughput. Crucially, the approach demonstrates stability at scale and near-lossless quality, addressing the practical concerns that have hampered FP8 adoption.

Why it matters: reasoning-enhanced LLMs often demand longer contexts, deeper stacks, and more tokens. An FP8 pipeline that behaves predictably enables faster iteration cycles, cheaper ablations, and broader access for labs and startups. The team commits to releasing code, positioning InfiR2 as a foundation others can build on, adapt to new hardware, and extend to multi-modal stacks.

Expect downstream work to explore FP8-aware optimizers, activation scaling strategies for long-context regimes, and push-button migration guides from BF16.

Paper: http://arxiv.org/abs/2509.22536v1

Register: https://www.AiFeta.com

#AI #LLM #FP8 #Training #Efficiency #Quantization #Reasoning

Read more