Driving on Registers: DrivoR for lean, adaptive end-to-end driving
Meet DrivoR, a simple, efficient transformer for end-to-end autonomous driving.
It builds on pretrained Vision Transformers and adds camera-aware "register tokens" that compress multi-camera views into a compact scene summary, cutting downstream compute without sacrificing accuracy.
DrivoR then runs two lightweight decoders: one to propose driving paths, and one to score them by mimicking an oracle, with interpretable sub-scores you can tune at inference:
- Safety
- Comfort
- Efficiency
Despite its minimal design, DrivoR matches or beats strong baselines on NAVSIM-v1, NAVSIM-v2, and the photorealistic, closed-loop HUGSIM benchmark.
Takeaway: a pure-transformer stack plus targeted token compression can deliver accurate, efficient, and behavior-adaptive end-to-end driving. Paper: https://arxiv.org/abs/2601.05083v1. Code and checkpoints will be released on the project page.
Paper: https://arxiv.org/abs/2601.05083v1
Register: https://www.AiFeta.com
#AutonomousDriving #AI #ComputerVision #Transformers #Robotics #Safety #Efficiency