When Prompts Make AIs "See" Things: Inside Vision-Language Hallucinations
Sometimes, vision-language AIs see what the prompt says, not what is in the picture. New research maps how that happens—and how to dial it down.
Rudman and colleagues tested object counting with prompts that intentionally overstate what is in an image (for example, "describe four waterlilies" when only three exist). At low counts, models often correct the prompt. As the number of objects grows, they increasingly parrot the prompt, even when it conflicts with the image.
Key findings:
- Across three models, a small set of attention heads mediates this prompt copying.
- Turning off those heads cuts prompt-induced hallucinations by 40%+ without extra training.
- The exact heads and pathways differ by model, but ablation consistently nudges answers toward visual evidence.
Takeaway: targeted tweaks inside today’s VLMs can make them rely more on what they see—and less on what we say.
Paper: https://arxiv.org/abs/2601.05201v1
Paper: https://arxiv.org/abs/2601.05201v1
Register: https://www.AiFeta.com
AI VisionLanguageModels Hallucinations ResponsibleAI MachineLearning ExplainableAI Research