Meet d3LLM: Ultra-Fast Diffusion LLMs Without the Accuracy Trade-Off
What if AI could write many words at once, not just one after another? Diffusion LLMs can—yet they often trade speed for accuracy. Meet d3LLM, a new approach that keeps both.
- Smarter training: Pseudo-trajectory distillation teaches the model which tokens are safe to finalize early, enabling confident parallel decoding.
- Faster inference: Entropy-based multi-block decoding groups “easy” tokens together and uses a KV-cache refresh to keep context sharp while scaling parallelism.
- Fairer evaluation: AUP (Accuracy Under Parallelism) measures quality and speed in one metric.
The authors report up to 10x speedup over prior diffusion LLMs (LLaDA/Dream) and 5x over standard autoregressive models, with minimal accuracy loss. Code: https://github.com/hao-ai-lab/d3LLM Paper: https://arxiv.org/abs/2601.07568v1
Paper: https://arxiv.org/abs/2601.07568v1
Register: https://www.AiFeta.com
AI LLM DiffusionModels NLP MachineLearning Inference ParallelComputing OpenSource Research