Smarter Email Dataset to Tackle Phishing and Spam

Smarter Email Dataset to Tackle Phishing and Spam

Smarter email defenses, grounded in real messages

Phishing and spam are evolving fast—often with help from AI. This study releases a large, carefully labeled email dataset spanning phishing, spam, and legitimate messages, with a key twist: it marks whether each message was written by a human or an LLM.

Every email also includes rich annotations: emotional tactics (like urgency, fear, or authority) and the attacker’s goal (link-clicks, credential theft, or financial fraud). The authors benchmark multiple LLMs to spot these cues, select the most reliable model to scale up labeling, and then stress-test robustness by rephrasing emails with several LLMs while preserving intent.

Results: today’s top models are strong at catching phishing, but still struggle to tell spam from genuine emails—an important gap for safer inboxes.

The dataset, code, and templates are openly available to accelerate research and deploy better defenses. Learn more: https://arxiv.org/abs/2511.21448v1

Paper: https://arxiv.org/abs/2511.21448v1

Register: https://www.AiFeta.com

cybersecurity AI LLM EmailSecurity Phishing SpamDetection Dataset OpenScience

Read more