Do LLMs Really Memorize Your Personal Data? A Cue-Controlled Look at PII Leakage
Do big AI models really "remember" your phone number? A new study says: often, no.
Past claims of PII leakage may be inflated by "lexical cues"—obvious hints in the prompt (like giving part of a name or address) that let models complete patterns rather than recall hidden data.
The authors introduce a cue-controlled test, Cue-Resistant Memorization (CRM), and re-check PII leakage across 32 languages and multiple tasks. Once cues are removed, reconstruction success drops sharply. Cue-free generation and membership inference show extremely low true positives.
Takeaways:
- Apparent "leaks" often come from the prompt, not the model's memory.
- Privacy risk isn't zero, but measuring it requires cue-controlled evaluations.
- Better benchmarks can guide safer model training and release decisions.
Bottom line: Evaluate LLM privacy with cue awareness—otherwise we may mistake pattern completion for memorization.
Paper: https://arxiv.org/abs/2601.03791v1
Register: https://www.AiFeta.com
#AI #Privacy #LLMs #DataSecurity #NLP