TAMAS: Stress-testing Multi‑Agent AI for Safety

Kari Jaaskelainen

10 Nov 2025 — 1 min read

AI agents are starting to work in teams. That unlocks power—and new ways things can go wrong.

TAMAS is a benchmark that stress‑tests multi‑agent LLM systems against adversarial tricks and coordination failures.

5 realistic scenarios, 300 attack instances across 6 attack types
211 tools, plus 100 harmless tasks to check false alarms
Evaluated with 10 backbone LLMs and 3 interaction setups (AutoGen, CrewAI)

The authors also propose an Effective Robustness Score (ERS): a simple metric that weighs both task success and safety, so teams can see trade‑offs at a glance.

Result: today’s multi‑agent systems are highly vulnerable. Attacks that exploit role handoffs, tool use, and peer persuasion often slip through.

If you build with agents, TAMAS offers a clear yardstick for red‑teaming and defense design. Read the paper: http://arxiv.org/abs/2511.05269v1

Paper: http://arxiv.org/abs/2511.05269v1

Register: https://www.AiFeta.com

#AI #Safety #LLM #MultiAgent #Security #Benchmark #AIAgents #RedTeam #TAMAS

TAMAS: Stress-testing Multi‑Agent AI for Safety

Kari Jaaskelainen

Read more

Automating GDPR Compliance: A Roadmap for Companies and Law Firms

FPGAs for Faster, Leaner Deep Learning: A Review of CNN Accelerators

Dynamic-K: Recommendations That Know When to Stop

Teaching chatbots to stop contradicting themselves (DECODE)