Bias scores shift with context: Contextual StereoSet
Models that seem fair in the lab can slip in the wild. Contextual StereoSet shows how measured bias swings when you change the framing—no adversarial prompting required.
- Same stereotypes, new frames: Hold content constant, vary time, place, or audience.
- Striking shifts: Anchoring to 1990 (vs. 2030) raised stereotype choices in all tested models (p<0.05). Gossip framing raised them in 5/6 models. Out-group observer framing shifted rates by up to 13 percentage points.
- Across domains: Effects replicate in hiring, lending, and help-seeking vignettes.
- Quick or deep: A 360-context diagnostic grid for deep dives, and a budgeted protocol covering 4,229 items for production screening.
- CSF profiles: Context Sensitivity Fingerprints summarize how a model’s bias score disperses across contexts, with bootstrap CIs and FDR-corrected contrasts.
The takeaway: stop asking Is this model biased? Start asking Under what conditions does bias appear? It’s a robustness stress test, not a claim about ground-truth bias rates. Code, benchmark, and results: https://arxiv.org/abs/2601.10460v1
Paper: https://arxiv.org/abs/2601.10460v1
Register: https://www.AiFeta.com
#AI #MachineLearning #LLM #NLP #AIEthics #ResponsibleAI #Bias #Evaluation