Bias scores shift with context: Contextual StereoSet

Models that seem fair in the lab can slip in the wild. Contextual StereoSet shows how measured bias swings when you change the framing—no adversarial prompting required.

Same stereotypes, new frames: Hold content constant, vary time, place, or audience.
Striking shifts: Anchoring to 1990 (vs. 2030) raised stereotype choices in all tested models (p<0.05). Gossip framing raised them in 5/6 models. Out-group observer framing shifted rates by up to 13 percentage points.
Across domains: Effects replicate in hiring, lending, and help-seeking vignettes.
Quick or deep: A 360-context diagnostic grid for deep dives, and a budgeted protocol covering 4,229 items for production screening.
CSF profiles: Context Sensitivity Fingerprints summarize how a model’s bias score disperses across contexts, with bootstrap CIs and FDR-corrected contrasts.

The takeaway: stop asking Is this model biased? Start asking Under what conditions does bias appear? It’s a robustness stress test, not a claim about ground-truth bias rates. Code, benchmark, and results: https://arxiv.org/abs/2601.10460v1

Paper: https://arxiv.org/abs/2601.10460v1

Register: https://www.AiFeta.com

#AI #MachineLearning #LLM #NLP #AIEthics #ResponsibleAI #Bias #Evaluation

Bias scores shift with context: Contextual StereoSet

Read more

Tekoälyapuria ei kannata valita pelkän esittelytekstin perusteella

Hakutulosten kannattaa olla hyödyllisiä, ei vain samankaltaisia

Yksi malli voi pian puhua, soittaa ja kolista – pelkillä tekstiohjeilla

Tekoälyn kanssa pärjäämme paremmin sopimalla kuin komentamalla