MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes

High-quality data over sheer quantity: sub‑billion models that reason competitively with only ~2T core tokens

Do we really need tens of trillions of tokens to unlock reasoning in small models? MobileLLM‑R1 argues no. By carefully selecting and resampling about ~2T tokens of high-quality open data, then pretraining to 4.2T tokens followed by a transparent post‑training stack, the authors develop sub‑billion‑parameter reasoning models that punch far above their weight.

What’s notable isn’t just efficiency—it’s results. MobileLLM‑R1‑950M posts an AIME score of 15.5, compared to 0.6 for OLMo‑2‑1.48B and 0.3 for SmolLM‑2‑1.7B. Even more striking: despite using only 11.7% of the tokens reportedly used by Qwen3 for pretraining (36T), the 950M variant matches or surpasses Qwen3‑0.6B across multiple reasoning benchmarks.

How they did it: the team defines metrics to identify beneficial datasets, then curates, mixes, and resamples to amplify signal without ballooning total tokens. The training recipe emphasizes data quality, reasoning‑oriented distributions, and a standard post‑training sequence that stays fully open. The result is a repeatable, open recipe for small reasoning‑first LLMs.

Why it matters: if reasoning can emerge reliably in sub‑billion models with tractable data budgets, high‑quality on‑device and edge reasoning becomes far more practical. Beyond performance, the release includes data sources, mixing ratios, training details, and checkpoints—lowering the barrier for the community to iterate and test new ideas.

Who should care: practitioners building fast, deployable reasoners; researchers studying data scaling laws for reasoning; and teams seeking transparent, reproducible training pipelines.

Paper: arXiv: MobileLLM‑R1
Register: AiFeta

#LLM #Reasoning #SmallModels #DataCuration #OpenSource #EdgeAI #AIME

Read more