VideoAR: Faster, scalable AI video generation
What’s new
VideoAR is a new way to generate videos with AI that predicts the next frame, over and over, at multiple scales. By separating what happens within a frame (spatial) from what happens across frames (temporal), it aims to make video generation faster and more stable.
- 3D multi-scale tokenizer compresses motion and detail efficiently.
- Memory-style improvements — Multi-scale Temporal RoPE, Cross-Frame Error Correction, and Random Frame Mask — help keep scenes consistent over time.
- Progressive pretraining teaches the model to handle longer, higher-resolution videos.
Why it matters
The authors report state-of-the-art results among autoregressive models: better realism (lower FVD on UCF-101: 99.5 -> 88.6) with over 10x fewer inference steps, and a VBench score of 81.74 — competitive with much larger diffusion models. In short: similar quality, far less compute.
Paper: https://arxiv.org/abs/2601.05966v1
Paper: https://arxiv.org/abs/2601.05966v1
Register: https://www.AiFeta.com
AI VideoGeneration GenerativeAI MachineLearning ComputerVision Autoregressive Research