Teaching AI to Finish Long, Complex Tasks with Timely Human Help
AI agents are getting good at coding and research, but they still struggle with long, domain-specific projects. Training them is hard: full-time human annotation is too costly, while pure trial-and-error rarely finds successful paths.
Apollo is a new sampling framework that blends lightweight, asynchronous human guidance with strict action-level filtering.
- Humans jump in only when the agent drifts, offering tips or domain knowledge - making 30+ hour sessions feasible.
- Filtering removes weak steps to stop error snowballs and keep trajectories useful.
On InnovatorBench, training GLM-4.5 with Apollo delivered 50%+ gains over the untrained baseline and 28% over a no-human variant.
Bottom line: smarter human-in-the-loop interaction can reliably train agents for long-horizon tasks like coding, deep research, and UI automation. More: http://arxiv.org/abs/2510.27630v2
Paper: http://arxiv.org/abs/2510.27630v2
Register: https://www.AiFeta.com
AI LLM HumanInTheLoop MachineLearning AIResearch AutonomousAgents