Lost in Translation and Noise: Why VLMs Stumble on Real-World Tables
Does your vision-language model crush table QA benchmarks—until the tables look like scans or switch languages? MirageTVQA puts that to the test.
It’s a new benchmark of ~60,000 question–answer pairs across 24 languages, with tables that are multilingual and visually imperfect (think scanned documents with noise), reflecting how data appears in the wild.
- Noise hurts—badly: leading VLMs lose over 35% performance when tables include realistic visual artifacts.
- English-first bias: reasoning that works in English often fails to transfer to other languages.
Why this matters: real invoices, reports, and forms are messy and multilingual. MirageTVQA is a yardstick to build and measure more robust table reasoning.
Paper: https://arxiv.org/abs/2511.17238v1 • Data & code: https://github.com/anshulsc/MirageTVQA
Paper: https://arxiv.org/abs/2511.17238v1
Register: https://www.AiFeta.com
#AI #VLM #VisionLanguage #Multilingual #OCR #TableQA #Benchmark #MLResearch #RobustAI #DocumentAI