Illusions of Confidence? Testing LLM Truthfulness with Neighborhood Consistency
TL;DR
Large language models can sound confident, but if you nudge the context, their beliefs can fall apart. This paper proposes a better way to check if an AI keeps its story straight.
- Problem: Popular checks like self-consistency can hide brittle knowledge. Answers that look reliable vanish under mild rewording or distracting details.
- Solution — Neighbor-Consistency Belief (NCB): Measure whether answers stay coherent across a neighborhood of related prompts and contexts.
- Stress test: A cognitive interference protocol that perturbs context to probe stability.
- Results: High-NCB data resists interference, and a new Structure-Aware Training method cuts long-tail brittleness by ~30%.
- Why it matters: More truthful, robust AIs for real-world use—not just confident-sounding ones.
Paper: https://arxiv.org/abs/2601.05905
Code: https://github.com/zjunlp/belief
Paper: https://arxiv.org/abs/2601.05905v1
Register: https://www.AiFeta.com
AI LLM NLP Robustness TrustworthyAI MachineLearning Research Evaluation