DigiData: Training and Evaluating General-Purpose Mobile Control Agents

Kari Jaaskelainen

11 Nov 2025 — 1 min read

Meet DigiData: AI that can use your phone

Imagine an assistant that taps, swipes, and navigates apps to finish tasks for you. This paper introduces DigiData—a large, diverse, multi-modal dataset built to train mobile control agents to do exactly that.

Richer goals: Instead of scraping random user logs, DigiData maps app features through systematic exploration, yielding harder, more human-relevant tasks.
Real-world testing: DigiData-Bench evaluates agents on complex mobile workflows, not toy demos.
Better metrics: The popular “step accuracy” score can mislead. The authors propose dynamic protocols and AI-powered reviews that judge whether an agent actually completes the task.

Why it matters: Stronger data and fairer evaluations speed up progress toward trustworthy, helpful phone agents—and safer automation of everyday digital chores.

Paper: http://arxiv.org/abs/2511.07413v1

Paper: http://arxiv.org/abs/2511.07413v1

Register: https://www.AiFeta.com

#AI #Mobile #Agents #Dataset #Benchmark #MachineLearning #HCI #UX #Evaluation #MobileAI

Safe Answers Can Still Teach Risky Skills, Study Finds

Even when advanced AI systems refuse to give dangerous instructions, their seemingly harmless answers can be reused to teach smaller models risky skills. A new study shows that safety filters at the output level are not enough on their own. This matters because it affects how quickly powerful know‑how

Graph neural networks can act as fast problem‑solving shortcuts

Cornell University researchers report that a type of AI called a graph neural network can learn to solve classic routing puzzles on its own and produce answers in one shot. This matters because many real tasks — from delivery planning to chip design — boil down to such puzzles, where speed and

Making AI steadier at reading emotions in mental‑health texts

Researchers have built a method to make artificial intelligence more reliable when it reads emotions in text, such as clinical notes, counselling chats and posts in online support groups. This matters because early triage and risk assessment often depend on what people write and how that writing is interpreted. Why

An AI that designs its own safety tests for other AI systems

A research team has built an AI system that designs and improves safety tests for other AI models on its own. In trials, it found ways to make models break their own rules more often than methods designed by people. This matters because safety testing needs to keep pace with

Meet DigiData: AI that can use your phone

Read more

Safe Answers Can Still Teach Risky Skills, Study Finds

Graph neural networks can act as fast problem‑solving shortcuts

Making AI steadier at reading emotions in mental‑health texts

An AI that designs its own safety tests for other AI systems