MAESTRO: An AI coach that shapes practice and rewards for smarter traffic control

MAESTRO: An AI coach that shapes practice and rewards for smarter traffic control

Meet MAESTRO: an AI coach for teamwork

Training teams of AI agents is hard: you must design the right "score" (rewards) and the right practice plan (curriculum). Get either wrong and learning stalls. MAESTRO uses a large language model (LLM) as an offline training architect—so it guides learning without slowing real-time decisions.

  • Curriculum generator: creates diverse, performance-driven traffic scenarios that steadily raise the challenge.
  • Reward synthesizer: auto-writes executable Python reward functions tailored to each stage.

Because the LLM is used only during training, deployment stays fast and cheap. MAESTRO plugs into a standard multi-agent RL backbone without changing inference cost.

In a 16-intersection Hangzhou traffic simulation, MAESTRO improved average return by +4.0% (163.26 vs. 156.93) and boosted stability by 2.2% (Sharpe 1.53 vs. 0.70) over a strong curriculum baseline. Ablations show the combo of LLM-made curricula + rewards works best.

Big picture: LLMs can be great designers for cooperative AI—setting the practice and the scoring—while the agents handle the real-time control.

Paper: https://arxiv.org/abs/2511.19253

Paper: https://arxiv.org/abs/2511.19253v1

Register: https://www.AiFeta.com

AI ReinforcementLearning MARL MultiAgent LLM MachineLearning SmartCities Traffic

Read more