MAESTRO: An AI coach that shapes practice and rewards for smarter traffic control
Meet MAESTRO: an AI coach for teamwork
Training teams of AI agents is hard: you must design the right "score" (rewards) and the right practice plan (curriculum). Get either wrong and learning stalls. MAESTRO uses a large language model (LLM) as an offline training architect—so it guides learning without slowing real-time decisions.
- Curriculum generator: creates diverse, performance-driven traffic scenarios that steadily raise the challenge.
- Reward synthesizer: auto-writes executable Python reward functions tailored to each stage.
Because the LLM is used only during training, deployment stays fast and cheap. MAESTRO plugs into a standard multi-agent RL backbone without changing inference cost.
In a 16-intersection Hangzhou traffic simulation, MAESTRO improved average return by +4.0% (163.26 vs. 156.93) and boosted stability by 2.2% (Sharpe 1.53 vs. 0.70) over a strong curriculum baseline. Ablations show the combo of LLM-made curricula + rewards works best.
Big picture: LLMs can be great designers for cooperative AI—setting the practice and the scoring—while the agents handle the real-time control.
Paper: https://arxiv.org/abs/2511.19253
Paper: https://arxiv.org/abs/2511.19253v1
Register: https://www.AiFeta.com
AI ReinforcementLearning MARL MultiAgent LLM MachineLearning SmartCities Traffic