Meet d3LLM: Ultra-Fast Diffusion LLMs Without the Accuracy Trade-Off

Kari Jaaskelainen

13 Jan 2026 — 1 min read

What if AI could write many words at once, not just one after another? Diffusion LLMs can—yet they often trade speed for accuracy. Meet d3LLM, a new approach that keeps both.

Smarter training: Pseudo-trajectory distillation teaches the model which tokens are safe to finalize early, enabling confident parallel decoding.
Faster inference: Entropy-based multi-block decoding groups “easy” tokens together and uses a KV-cache refresh to keep context sharp while scaling parallelism.
Fairer evaluation: AUP (Accuracy Under Parallelism) measures quality and speed in one metric.

The authors report up to 10x speedup over prior diffusion LLMs (LLaDA/Dream) and 5x over standard autoregressive models, with minimal accuracy loss. Code: https://github.com/hao-ai-lab/d3LLM Paper: https://arxiv.org/abs/2601.07568v1

Paper: https://arxiv.org/abs/2601.07568v1

Register: https://www.AiFeta.com

AI LLM DiffusionModels NLP MachineLearning Inference ParallelComputing OpenSource Research

Meet d3LLM: Ultra-Fast Diffusion LLMs Without the Accuracy Trade-Off

Kari Jaaskelainen

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen