VisionLanguage - AI Feta, the news about scientific AI research

AI

Molmo2: Open Video-Language AI with Pixel-Level Grounding

Most top video AIs are locked up. Molmo2 opens the door: open weights and open datasets, built to understand videos and ground that understanding by pointing to and tracking objects in the pixels. * Data you can build on: 7 new video datasets and 2 multi-image sets, including rich video captions,

AI

Teaching Vision-Language Agents with Deliberate Practice (DPPO)

Building capable embodied AI is hard: real-world data is scarce and training is expensive. This paper introduces Deliberate Practice Policy Optimization (DPPO), a coach-like training loop that helps vision-language agents learn more from less. How it works: the system alternates between learning from examples (to expand skills) and trial-and-error reinforcement

AutonomousDriving

Teaching self-driving cars to spot the unknown

Teaching self-driving cars to spot the unknown Real roads are messy. To stay safe, autonomous vehicles must notice unexpected hazards—from a toppled ladder to a runaway stroller. That’s the challenge of out-of-distribution (OOD) segmentation. Seungheon Song and Jaekoo Lee propose a simple idea with big impact: use language