Do LLMs Really Memorize Your Personal Data? A Cue-Controlled Look at PII Leakage

Kari Jaaskelainen

08 Jan 2026 — 1 min read

Do big AI models really "remember" your phone number? A new study says: often, no.

Past claims of PII leakage may be inflated by "lexical cues"—obvious hints in the prompt (like giving part of a name or address) that let models complete patterns rather than recall hidden data.

The authors introduce a cue-controlled test, Cue-Resistant Memorization (CRM), and re-check PII leakage across 32 languages and multiple tasks. Once cues are removed, reconstruction success drops sharply. Cue-free generation and membership inference show extremely low true positives.

Takeaways:

Apparent "leaks" often come from the prompt, not the model's memory.
Privacy risk isn't zero, but measuring it requires cue-controlled evaluations.
Better benchmarks can guide safer model training and release decisions.

Bottom line: Evaluate LLM privacy with cue awareness—otherwise we may mistake pattern completion for memorization.

Paper: https://arxiv.org/abs/2601.03791v1

Register: https://www.AiFeta.com

#AI #Privacy #LLMs #DataSecurity #NLP

Do LLMs Really Memorize Your Personal Data? A Cue-Controlled Look at PII Leakage

Kari Jaaskelainen

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen