When Bias Pretends to Be Truth in AI

When Bias Pretends to Be Truth in AI

Large language models can sound confident even when they are wrong. This study pinpoints a sneaky culprit: spurious correlations — statistical shortcuts like linking certain surnames to a nationality. Models absorb these patterns from data and then answer with high certainty, even when the link is false.

What the researchers found

  • These bias-driven errors are produced confidently.
  • Making models bigger does not fix them.
  • Popular detectors — confidence filters and inner-state probing — often miss them.
  • They persist even after refusal or safety fine-tuning.

Why? A model’s confidence often tracks how common a pattern is in training data, not whether it is true. When the pattern is spurious, confidence misleads both the model and our detectors.

What’s needed: methods that audit correlations, run counterfactual checks, and stress-test models with controlled synthetic data — instead of trusting confidence alone.

Bias can masquerade as truth — and our detectors can be fooled, too.

Paper by Shaowen Wang, Yiqi Dong, Ruinian Chang, Tansheng Zhu, Yuebo Sun, Kaifeng Lyu, and Jian Li. Read: http://arxiv.org/abs/2511.07318

Paper: http://arxiv.org/abs/2511.07318v1

Register: https://www.AiFeta.com

ai llms hallucinations bias spuriouscorrelations trustworthyai evaluation machinelearning research

Read more