Scaling Synthetic Task Generation for Agents via Exploration

AutoPlay explores environments to synthesize diverse, verifiable tasks—fueling stronger UI agents

Agent post‑training hinges on rich, grounded tasks—but human‑curated datasets are costly and often shallow. AutoPlay introduces a scalable alternative: let an MLLM‑based explorer systematically probe interactive environments, discover states and affordances, and then synthesize executable, verifiable tasks from what it learns.

The pipeline runs in two stages. Exploration: an explorer agent uncovers novel app states and functionalities to build a map of possibilities. Task generation: a task generator uses exploration trajectories and guideline prompts to produce diverse, feasible tasks with built‑in verifiability. The approach grounds tasks in actual UI states, raising coverage and realism without human annotation.

At scale, AutoPlay yields 20k tasks across 20 Android apps and 10k across 13 Ubuntu apps. Coupled with an MLLM task executor and verifier, the framework produces large‑volume demonstrations. Training UI agents on this data improves success rates by up to 20.0% for mobile use and 10.9% for computer use; adding RL with verifier‑based rewards brings a further 5.7% gain.

Why it matters: exploration‑driven synthesis is a general recipe for building agent training corpora in domains like mobile automation, web navigation, and desktop workflows—where state space and interfaces change constantly.

Paper: arXiv: AutoPlay
Register: AiFeta

#AIAgents #TaskGeneration #UIAutomation #MLLM #ReinforcementLearning #DataSynthesis #Exploration

Read more