TAMAS: Stress-testing Multi‑Agent AI for Safety
AI agents are starting to work in teams. That unlocks power—and new ways things can go wrong.
TAMAS is a benchmark that stress‑tests multi‑agent LLM systems against adversarial tricks and coordination failures.
- 5 realistic scenarios, 300 attack instances across 6 attack types
- 211 tools, plus 100 harmless tasks to check false alarms
- Evaluated with 10 backbone LLMs and 3 interaction setups (AutoGen, CrewAI)
The authors also propose an Effective Robustness Score (ERS): a simple metric that weighs both task success and safety, so teams can see trade‑offs at a glance.
Result: today’s multi‑agent systems are highly vulnerable. Attacks that exploit role handoffs, tool use, and peer persuasion often slip through.
If you build with agents, TAMAS offers a clear yardstick for red‑teaming and defense design. Read the paper: http://arxiv.org/abs/2511.05269v1
Paper: http://arxiv.org/abs/2511.05269v1
Register: https://www.AiFeta.com
#AI #Safety #LLM #MultiAgent #Security #Benchmark #AIAgents #RedTeam #TAMAS