DR Tulu: Evolving Rubrics Teach AI to Do Deep Research
Long, well-sourced answers are hard for AI to learn: most training rewards short, easily graded Q&A. This work introduces Reinforcement Learning with Evolving Rubrics (RLER) - grading guides that co-evolve with the model, so feedback stays aligned with what the model actually explores during multi-step research.
Built with RLER, Deep Research Tulu (DR Tulu-8B) is the first open model directly trained for open-ended, long-form deep research.
- Produces multi-step, cited, long-form answers with on-policy feedback.
- Outperforms open deep research models on four benchmarks (science, healthcare, general domains).
- Matches or exceeds proprietary systems while being smaller and cheaper per query.
- Everything is open: data, models, code, plus a new MCP-based agent infrastructure.
Read the paper: https://arxiv.org/abs/2511.19399v1
Paper: https://arxiv.org/abs/2511.19399v1
Register: https://www.AiFeta.com
AIResearch ReinforcementLearning LLMs OpenSource DeepResearch NLP Science Healthcare