Smarter Email Dataset to Tackle Phishing and Spam
Smarter email defenses, grounded in real messages
Phishing and spam are evolving fast—often with help from AI. This study releases a large, carefully labeled email dataset spanning phishing, spam, and legitimate messages, with a key twist: it marks whether each message was written by a human or an LLM.
Every email also includes rich annotations: emotional tactics (like urgency, fear, or authority) and the attacker’s goal (link-clicks, credential theft, or financial fraud). The authors benchmark multiple LLMs to spot these cues, select the most reliable model to scale up labeling, and then stress-test robustness by rephrasing emails with several LLMs while preserving intent.
Results: today’s top models are strong at catching phishing, but still struggle to tell spam from genuine emails—an important gap for safer inboxes.
The dataset, code, and templates are openly available to accelerate research and deploy better defenses. Learn more: https://arxiv.org/abs/2511.21448v1
Paper: https://arxiv.org/abs/2511.21448v1
Register: https://www.AiFeta.com
cybersecurity AI LLM EmailSecurity Phishing SpamDetection Dataset OpenScience