When Bias Pretends to Be Truth in AI
Large language models can sound confident even when they are wrong. This study pinpoints a sneaky culprit: spurious correlations — statistical shortcuts like linking certain surnames to a nationality. Models absorb these patterns from data and then answer with high certainty, even when the link is false.
What the researchers found
- These bias-driven errors are produced confidently.
- Making models bigger does not fix them.
- Popular detectors — confidence filters and inner-state probing — often miss them.
- They persist even after refusal or safety fine-tuning.
Why? A model’s confidence often tracks how common a pattern is in training data, not whether it is true. When the pattern is spurious, confidence misleads both the model and our detectors.
What’s needed: methods that audit correlations, run counterfactual checks, and stress-test models with controlled synthetic data — instead of trusting confidence alone.
Bias can masquerade as truth — and our detectors can be fooled, too.
Paper by Shaowen Wang, Yiqi Dong, Ruinian Chang, Tansheng Zhu, Yuebo Sun, Kaifeng Lyu, and Jian Li. Read: http://arxiv.org/abs/2511.07318
Paper: http://arxiv.org/abs/2511.07318v1
Register: https://www.AiFeta.com
ai llms hallucinations bias spuriouscorrelations trustworthyai evaluation machinelearning research