VisionLanguageModels

When Prompts Make AIs "See" Things: Inside Vision-Language Hallucinations

Sometimes, vision-language AIs see what the prompt says, not what is in the picture. New research maps how that happens—and how to dial it down. Rudman and colleagues tested object counting with prompts that intentionally overstate what is in an image (for example, "describe four waterlilies" when

Teach Robots to Learn Like People

Teach Robots to Learn Like People Building versatile embodied AI (agents that see, read, and act) is hard because real-world data is scarce and training is expensive. We introduce Deliberate Practice Policy Optimization (DPPO), a metacognitive "metaloop" that alternates between: * Supervised fine-tuning to expand general competence * Reinforcement learning