ROI-Reasoning: Teaching AI to Budget Its Thinking Tokens
Smarter AI means knowing when to think harder
LLMs can improve by thinking longer, but they rarely know how much thinking a task really needs. ROI-Reasoning trains models to plan their effort under a hard token budget, like students managing time on an exam.
- Meta-Cognitive Fine-Tuning: Before answering, the model estimates difficulty, predicts tokens needed, and chooses to solve or skip.
- Rationality-Aware Reinforcement Learning: The model learns long-horizon strategies to allocate its limited thinking tokens across many questions.
The authors cast this as an Ordered Stochastic Multiple-Choice Knapsack Problem: spend tokens where they yield the highest return.
On budgeted math benchmarks, ROI-Reasoning increases overall scores and reduces regret when computation is tight.
Bottom line: better meta-cognition means better answers per token.
Paper: https://arxiv.org/abs/2601.03822v1. Authors: Muyang Zhao, Qi Qi, Hao Sun.
Paper: https://arxiv.org/abs/2601.03822v1
Register: https://www.AiFeta.com
AI LLMs Reasoning MetaCognition ReinforcementLearning Optimization Tokens