Build Multimodal Stories, One Node at a Time

Kari Jaaskelainen

07 Nov 2025 — 1 min read

Tell richer stories, one node at a time

This research introduces a node-based editor that lets you build a multimodal story as a graph—each “node” can hold text, images, audio, and video. You can expand, reorder, and refine nodes using plain-language prompts or direct edits.

Targeted edits: tweak a single scene without breaking the rest.
Branching: spin up parallel storylines automatically and compare.
Smart routing: an agent chooses the right generator for structure, content, formatting, and context.
Iterative control: refine nodes step by step across media.

In tests, the system generated coherent outlines and supported hands-on editing workflows, giving creators more control over pacing and structure. Limitations remain—long narratives can be hard to scale, and keeping characters and details consistent across nodes is challenging. The authors point to human-in-the-loop tools as the next step.

Paper: http://arxiv.org/abs/2511.03227v2
Authors: Alexander Htet Kyaw, Lenin Ravindranath Sivalingam

Paper: http://arxiv.org/abs/2511.03227v2

Register: https://www.AiFeta.com

AI GenerativeAI Storytelling Multimodal HCI Research Creativity Video Audio Images

Build Multimodal Stories, One Node at a Time

Kari Jaaskelainen

Tell richer stories, one node at a time

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen