Lost in Translation and Noise: Why VLMs Stumble on Real-World Tables

Does your vision-language model crush table QA benchmarks—until the tables look like scans or switch languages? MirageTVQA puts that to the test.

It’s a new benchmark of ~60,000 question–answer pairs across 24 languages, with tables that are multilingual and visually imperfect (think scanned documents with noise), reflecting how data appears in the wild.

Noise hurts—badly: leading VLMs lose over 35% performance when tables include realistic visual artifacts.
English-first bias: reasoning that works in English often fails to transfer to other languages.

Why this matters: real invoices, reports, and forms are messy and multilingual. MirageTVQA is a yardstick to build and measure more robust table reasoning.

Paper: https://arxiv.org/abs/2511.17238v1 • Data & code: https://github.com/anshulsc/MirageTVQA

Paper: https://arxiv.org/abs/2511.17238v1

Register: https://www.AiFeta.com

#AI #VLM #VisionLanguage #Multilingual #OCR #TableQA #Benchmark #MLResearch #RobustAI #DocumentAI

Lost in Translation and Noise: Why VLMs Stumble on Real-World Tables

Read more

Tekoälyapuria ei kannata valita pelkän esittelytekstin perusteella

Hakutulosten kannattaa olla hyödyllisiä, ei vain samankaltaisia

Yksi malli voi pian puhua, soittaa ja kolista – pelkillä tekstiohjeilla

Tekoälyn kanssa pärjäämme paremmin sopimalla kuin komentamalla