Teach Robots to Learn Like People
Teach Robots to Learn Like People
Building versatile embodied AI (agents that see, read, and act) is hard because real-world data is scarce and training is expensive.
We introduce Deliberate Practice Policy Optimization (DPPO), a metacognitive "metaloop" that alternates between:
- Supervised fine-tuning to expand general competence
- Reinforcement learning to refine skills through practice
This loop spots weaknesses automatically and focuses compute where it matters, squeezing more learning from limited data. Theoretically, DPPO unifies preference learning; in practice, it powers Pelican-VL 1.0, a vision-language embodied model.
Results: +20.3% over the base model and +10.6% over open-source models around the 100B-parameter scale—using a systematic, data- and resource-efficient training recipe.
We’re open-sourcing the models and code so the community can build stronger embodied agents with fewer resources.
Paper: https://arxiv.org/abs/2511.16602v1
Paper: https://arxiv.org/abs/2511.16602v1
Register: https://www.AiFeta.com
AI Robotics EmbodiedAI ReinforcementLearning VisionLanguageModels MachineLearning OpenSource DataEfficiency