FocusUI: Faster, smarter AI that finds the right spot on your screen

Kari Jaaskelainen

08 Jan 2026 — 1 min read

Ever wished screen-reading AI would stop scanning every pixel and just tap what you asked for? FocusUI is a new method that helps vision-language models focus on what matters in app and web UIs—speeding them up without losing spatial precision.

How it works:

Selects only instruction-relevant patches, down-weighting big blank or uniform regions.
Preserves layout with PosPad, a marker that keeps positions when tokens are dropped.

Why it’s cool:

Beats GUI-specific baselines on 4 benchmarks.
On ScreenSpot-Pro, FocusUI-7B is +3.7% over GUI-Actor-7B.
Keeps up to 70% tokens pruned with only 3.2% accuracy drop.
Up to 1.44× faster inference and 17% lower peak GPU memory.

Closer to responsive, accessible AI agents that can navigate real apps. Paper: https://arxiv.org/abs/2601.03928v1

Paper: https://arxiv.org/abs/2601.03928v1

Register: https://www.AiFeta.com

AI UI UX HCI ComputerVision Multimodal LLM Accessibility Automation Efficiency Research

FocusUI: Faster, smarter AI that finds the right spot on your screen

Kari Jaaskelainen

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen