Untargeted jailbreaks: a broader, faster way to stress-test LLM safety
Not aiming for a script—just unsafe output. 🚨🧨🔍🛡️ This paper introduces a gradient-based untargeted jailbreak attack (UJA) that seeks any unsafe response rather than a specific target. It maximizes a judged “unsafety” signal by decomposing a non-differentiable objective into differentiable parts for optimizing both harmful responses and adversarial