Do LLMs Really Memorize Your Personal Data? A Cue-Controlled Look at PII Leakage

Do LLMs Really Memorize Your Personal Data? A Cue-Controlled Look at PII Leakage

Do big AI models really "remember" your phone number? A new study says: often, no.

Past claims of PII leakage may be inflated by "lexical cues"—obvious hints in the prompt (like giving part of a name or address) that let models complete patterns rather than recall hidden data.

The authors introduce a cue-controlled test, Cue-Resistant Memorization (CRM), and re-check PII leakage across 32 languages and multiple tasks. Once cues are removed, reconstruction success drops sharply. Cue-free generation and membership inference show extremely low true positives.

Takeaways:

  • Apparent "leaks" often come from the prompt, not the model's memory.
  • Privacy risk isn't zero, but measuring it requires cue-controlled evaluations.
  • Better benchmarks can guide safer model training and release decisions.

Bottom line: Evaluate LLM privacy with cue awareness—otherwise we may mistake pattern completion for memorization.

Paper: https://arxiv.org/abs/2601.03791v1

Register: https://www.AiFeta.com

#AI #Privacy #LLMs #DataSecurity #NLP

Read more