MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes
Sub-billion LLMs that reason well—trained on far less data, with a fully open recipe.
Do small models need massive corpora to reason? MobileLLM-R1 makes a compelling case that they don’t. Challenging the assumption that advanced chain-of-thought capabilities require >10T tokens, the authors carefully curate and resample open-source datasets using tailored quality metrics. They show that roughly ~2T high-quality tokens are sufficient to spark strong reasoning, and that a 4.2T-token pretraining pass—resampled from those ~2T tokens—plus established post-training can deliver state-of-the-art results for sub-billion models.
The headline result: MobileLLM-R1-950M posts an AIME score of 15.5, compared to 0.6 for OLMo-2-1.48B and 0.3 for SmolLM-2-1.7B. Perhaps more striking, despite using only 11.7% of the tokens reportedly used by Qwen3 (36T) for pretraining, MobileLLM-R1-950M matches or beats Qwen3-0.6B across multiple reasoning benchmarks.
- Data-first design: Curated and resampled open datasets, guided by custom benefit metrics.
- Efficient pretraining: 4.2T tokens sampled from a ~2T-token high-quality pool.
- Strong results at small scale: Competitive or better performance versus larger and proprietary-trained peers.
- Reproducibility: Full training recipe, data sources, mixing ratios, and checkpoints released.
Why it matters: If high-quality, well-mixed data can replace brute-force scale, then advanced reasoning becomes accessible to broader communities and on-device settings. MobileLLM-R1 offers a practical blueprint to build capable, interpretable, and efficient reasoners—without proprietary data advantages.
Paper: arXiv: MobileLLM-R1
Register: https://www.AiFeta.com
#LLM #Reasoning #DataCuration #SubBillion #OpenSource #Efficiency #AIME #NLP