Omni-R1: AI that draws its thoughts

Kari Jaaskelainen

15 Jan 2026 — 1 min read

What if AI could think with pictures?

Omni-R1 is a new multimodal AI that doesn’t just “talk through” problems—it draws its intermediate steps. Instead of relying on one fixed reasoning style, it unifies many skills (like zooming into regions, pointing to objects, or marking paths) by generating small helper images while reasoning.

Unified generative reasoning: one paradigm for many vision-language tasks.
Two-stage training: supervised fine-tuning + reinforcement learning with a perception alignment loss and perception reward to make the generated visuals actually useful.
Omni-R1-Zero: learns the same trick without multimodal labels by bootstrapping visual steps from text-only reasoning—and often matches or beats Omni-R1.

Why it matters: more general, transparent multimodal reasoning that can show its work across diverse tasks.

Paper: https://arxiv.org/abs/2601.09536v1 — Authors: Dongjie Cheng et al. (cs.AI)

Paper: https://arxiv.org/abs/2601.09536v1

Register: https://www.AiFeta.com

AI Multimodal MLLM ComputerVision GenerativeAI ReinforcementLearning Research

Omni-R1: AI that draws its thoughts

Kari Jaaskelainen

What if AI could think with pictures?

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen