Meet d3LLM: Ultra-Fast Diffusion LLMs Without the Accuracy Trade-Off

What if AI could write many words at once, not just one after another? Diffusion LLMs can—yet they often trade speed for accuracy. Meet d3LLM, a new approach that keeps both.

Smarter training: Pseudo-trajectory distillation teaches the model which tokens are safe to finalize early, enabling confident parallel decoding.
Faster inference: Entropy-based multi-block decoding groups “easy” tokens together and uses a KV-cache refresh to keep context sharp while scaling parallelism.
Fairer evaluation: AUP (Accuracy Under Parallelism) measures quality and speed in one metric.

The authors report up to 10x speedup over prior diffusion LLMs (LLaDA/Dream) and 5x over standard autoregressive models, with minimal accuracy loss. Code: https://github.com/hao-ai-lab/d3LLM Paper: https://arxiv.org/abs/2601.07568v1

Paper: https://arxiv.org/abs/2601.07568v1

Register: https://www.AiFeta.com

AI LLM DiffusionModels NLP MachineLearning Inference ParallelComputing OpenSource Research

Meet d3LLM: Ultra-Fast Diffusion LLMs Without the Accuracy Trade-Off

Read more

Tekoälyapuria ei kannata valita pelkän esittelytekstin perusteella

Hakutulosten kannattaa olla hyödyllisiä, ei vain samankaltaisia

Yksi malli voi pian puhua, soittaa ja kolista – pelkillä tekstiohjeilla

Tekoälyn kanssa pärjäämme paremmin sopimalla kuin komentamalla