MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

Sub-billion LLMs that reason well—trained on far less data, with a fully open recipe.

Do small models need massive corpora to reason? MobileLLM-R1 makes a compelling case that they don’t. Challenging the assumption that advanced chain-of-thought capabilities require >10T tokens, the authors carefully curate and resample open-source datasets using tailored quality metrics. They show that roughly ~2T high-quality tokens are sufficient to spark strong reasoning, and that a 4.2T-token pretraining pass—resampled from those ~2T tokens—plus established post-training can deliver state-of-the-art results for sub-billion models.

The headline result: MobileLLM-R1-950M posts an AIME score of 15.5, compared to 0.6 for OLMo-2-1.48B and 0.3 for SmolLM-2-1.7B. Perhaps more striking, despite using only 11.7% of the tokens reportedly used by Qwen3 (36T) for pretraining, MobileLLM-R1-950M matches or beats Qwen3-0.6B across multiple reasoning benchmarks.

  • Data-first design: Curated and resampled open datasets, guided by custom benefit metrics.
  • Efficient pretraining: 4.2T tokens sampled from a ~2T-token high-quality pool.
  • Strong results at small scale: Competitive or better performance versus larger and proprietary-trained peers.
  • Reproducibility: Full training recipe, data sources, mixing ratios, and checkpoints released.

Why it matters: If high-quality, well-mixed data can replace brute-force scale, then advanced reasoning becomes accessible to broader communities and on-device settings. MobileLLM-R1 offers a practical blueprint to build capable, interpretable, and efficient reasoners—without proprietary data advantages.

Paper: arXiv: MobileLLM-R1
Register: https://www.AiFeta.com

#LLM #Reasoning #DataCuration #SubBillion #OpenSource #Efficiency #AIME #NLP

Read more