TAMAS: Stress-testing Multi‑Agent AI for Safety

TAMAS: Stress-testing Multi‑Agent AI for Safety

AI agents are starting to work in teams. That unlocks power—and new ways things can go wrong.

TAMAS is a benchmark that stress‑tests multi‑agent LLM systems against adversarial tricks and coordination failures.

  • 5 realistic scenarios, 300 attack instances across 6 attack types
  • 211 tools, plus 100 harmless tasks to check false alarms
  • Evaluated with 10 backbone LLMs and 3 interaction setups (AutoGen, CrewAI)

The authors also propose an Effective Robustness Score (ERS): a simple metric that weighs both task success and safety, so teams can see trade‑offs at a glance.

Result: today’s multi‑agent systems are highly vulnerable. Attacks that exploit role handoffs, tool use, and peer persuasion often slip through.

If you build with agents, TAMAS offers a clear yardstick for red‑teaming and defense design. Read the paper: http://arxiv.org/abs/2511.05269v1

Paper: http://arxiv.org/abs/2511.05269v1

Register: https://www.AiFeta.com

#AI #Safety #LLM #MultiAgent #Security #Benchmark #AIAgents #RedTeam #TAMAS

Read more