VideoAR: Faster, scalable AI video generation

VideoAR: Faster, scalable AI video generation

What’s new

VideoAR is a new way to generate videos with AI that predicts the next frame, over and over, at multiple scales. By separating what happens within a frame (spatial) from what happens across frames (temporal), it aims to make video generation faster and more stable.

  • 3D multi-scale tokenizer compresses motion and detail efficiently.
  • Memory-style improvements — Multi-scale Temporal RoPE, Cross-Frame Error Correction, and Random Frame Mask — help keep scenes consistent over time.
  • Progressive pretraining teaches the model to handle longer, higher-resolution videos.

Why it matters

The authors report state-of-the-art results among autoregressive models: better realism (lower FVD on UCF-101: 99.5 -> 88.6) with over 10x fewer inference steps, and a VBench score of 81.74 — competitive with much larger diffusion models. In short: similar quality, far less compute.

Paper: https://arxiv.org/abs/2601.05966v1

Paper: https://arxiv.org/abs/2601.05966v1

Register: https://www.AiFeta.com

AI VideoGeneration GenerativeAI MachineLearning ComputerVision Autoregressive Research

Read more