RoboVIP: Multi-View Video Generation with Visual Identity Prompting
Robots learn from videos of their own actions—but collecting lots of diverse, multi-camera footage is slow and costly. RoboVIP is a generative approach that “films” new training scenes without touching the hardware. * Visual identity prompting: Instead of vague text, the model is guided by example images of the robot,