VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing
10,497 examples, 13 tasks: a holistic yardstick for voice-first multimodal assistants. Voice assistants are rapidly evolving into multimodal agents that must hear, speak, and see. Yet evaluation has lagged behind capability. VoiceAssistant-Eval fills this gap with a comprehensive benchmark of 10,497 curated examples across 13 task categories, spanning