Echo Chamber: Multi-Turn Jailbreaks That Fool Chatbot Guardrails

Kari Jaaskelainen

12 Jan 2026 — 1 min read

Can a polite chat turn dangerous for chatbots?

New research uncovers a stealthy way to "jailbreak" AI assistants without obvious toxic prompts. The authors present Echo Chamber, a multi-turn attack that slowly escalates a conversation so guardrails slip—think nudging, not smashing.

Unlike one-shot exploits, Echo Chamber works through a chain of friendly messages that build context and trust. The study compares it to other multi-turn methods and evaluates it across several state-of-the-art models, showing strong effectiveness in extensive tests.

Highlights: gradual escalation beats blunt prompts
Risk: financial and reputational damage for companies deploying chatbots
Takeaway: defenses must be conversation-aware, not just single-message filters

Why this matters: as more businesses adopt LLMs, attackers adapt too. Security teams need better red-teaming, multi-turn detectors, and training that resists context manipulation.

Paper by Ahmad Alobaid, Martí Jordà Roca, Carlos Castillo, and Joan Vendrell. Read more: https://arxiv.org/abs/2601.05742v1

Paper: https://arxiv.org/abs/2601.05742v1

Register: https://www.AiFeta.com

#AI #Cybersecurity #LLM #Safety #Chatbots #AISecurity #RedTeaming #InfoSec #ResponsibleAI

Echo Chamber: Multi-Turn Jailbreaks That Fool Chatbot Guardrails

Kari Jaaskelainen

Can a polite chat turn dangerous for chatbots?

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen