ROI-Reasoning: Teaching AI to Budget Its Thinking Tokens

ROI-Reasoning: Teaching AI to Budget Its Thinking Tokens

Smarter AI means knowing when to think harder

LLMs can improve by thinking longer, but they rarely know how much thinking a task really needs. ROI-Reasoning trains models to plan their effort under a hard token budget, like students managing time on an exam.

  • Meta-Cognitive Fine-Tuning: Before answering, the model estimates difficulty, predicts tokens needed, and chooses to solve or skip.
  • Rationality-Aware Reinforcement Learning: The model learns long-horizon strategies to allocate its limited thinking tokens across many questions.

The authors cast this as an Ordered Stochastic Multiple-Choice Knapsack Problem: spend tokens where they yield the highest return.

On budgeted math benchmarks, ROI-Reasoning increases overall scores and reduces regret when computation is tight.

Bottom line: better meta-cognition means better answers per token.

Paper: https://arxiv.org/abs/2601.03822v1. Authors: Muyang Zhao, Qi Qi, Hao Sun.

Paper: https://arxiv.org/abs/2601.03822v1

Register: https://www.AiFeta.com

AI LLMs Reasoning MetaCognition ReinforcementLearning Optimization Tokens

Read more