DR Tulu: Evolving Rubrics Teach AI to Do Deep Research

DR Tulu: Evolving Rubrics Teach AI to Do Deep Research

Long, well-sourced answers are hard for AI to learn: most training rewards short, easily graded Q&A. This work introduces Reinforcement Learning with Evolving Rubrics (RLER) - grading guides that co-evolve with the model, so feedback stays aligned with what the model actually explores during multi-step research.

Built with RLER, Deep Research Tulu (DR Tulu-8B) is the first open model directly trained for open-ended, long-form deep research.

  • Produces multi-step, cited, long-form answers with on-policy feedback.
  • Outperforms open deep research models on four benchmarks (science, healthcare, general domains).
  • Matches or exceeds proprietary systems while being smaller and cheaper per query.
  • Everything is open: data, models, code, plus a new MCP-based agent infrastructure.

Read the paper: https://arxiv.org/abs/2511.19399v1

Paper: https://arxiv.org/abs/2511.19399v1

Register: https://www.AiFeta.com

AIResearch ReinforcementLearning LLMs OpenSource DeepResearch NLP Science Healthcare

Read more