AIResearch
DR Tulu: Evolving Rubrics Teach AI to Do Deep Research
Long, well-sourced answers are hard for AI to learn: most training rewards short, easily graded Q&A. This work introduces Reinforcement Learning with Evolving Rubrics (RLER) - grading guides that co-evolve with the model, so feedback stays aligned with what the model actually explores during multi-step research. Built with