See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation

Kari Jaaskelainen

29 Sep 2025 — 1 min read

Turning natural-language intent into flight paths with waypoint grounding—no training required.

See, Point, Fly (SPF) reimagines vision-and-language navigation for drones by treating action selection as spatial grounding—not text generation. Instead of “talking” a UAV through step-by-step actions, SPF asks a vision-language model (VLM) to iteratively mark 2D waypoints on the live camera feed. These waypoints, paired with an adaptively chosen travel distance, are then lifted into 3D displacement vectors that the drone can execute immediately.

The result is a training-free, closed-loop controller that follows natural, free-form instructions—across goals, environments, and even moving targets. SPF’s adaptive distance adjustment accelerates progress in open spaces yet tightens control when precision matters. Because the VLM only needs to ground language in image space, SPF generalizes across different VLM backbones without custom fine-tuning.

Performance is striking: in a DRL simulation benchmark, SPF sets a new state of the art, outperforming the previous best by an absolute 63% margin. Extensive real-world trials show consistent gains over strong baselines, and ablations clarify how waypoint grounding, distance adaptation, and closed-loop control each contribute.

Why it matters: from search-and-rescue and infrastructure inspection to agriculture, filming, and security, operators can now steer drones with natural language that translates into grounded spatial actions—robustly and efficiently. SPF reduces the brittleness of text-only action generation and removes the cost of training, while enabling pursuit of dynamic targets in dynamic scenes.

Practical notes: SPF assumes a forward-facing camera and reasonable visual observability; extreme lighting, occlusions, or severe domain shifts may require additional sensing or safety bounds.

Paper: http://arxiv.org/abs/2509.22653v1

Register: https://www.AiFeta.com

#AI #Robotics #UAV #VLM #Navigation #ComputerVision #Autonomy

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Evidence that RL teaches genuinely new abilities: compositional skills emerge and transfer across tasks Does RL merely reweight what an LLM already knows—or can it teach genuinely new skills? This paper offers concrete evidence for the latter. Using a controlled, synthetic framework, the authors define “skills” as string transformation

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

A structured 80k instruction–image corpus spanning 11 domains and 51 subtasks to train unified visual editors Unified models for image generation and editing hit a data ceiling: existing corpora emphasize basic manipulations but miss real‑world complexity. OpenGPT‑4o‑Image tackles this with a hierarchical task taxonomy and automated

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

ROVER replaces PPO loops with uniform‑policy Q‑values—boosting quality and diversity in math reasoning Popular RLVR methods for LLM reasoning lean on generalized policy iteration (e.g., PPO/GRPO), but suffer instability and diversity collapse. This paper reframes math RLVR as a specialized finite‑horizon MDP with deterministic

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

A dynamic, self‑paced curriculum that restructures problems to match model ability in RLVR Online RL with Verifiable Rewards (RLVR) has boosted LLM reasoning—but most methods treat all problems equally, wasting effort on solved items and flailing on those beyond current capability. CLPO fixes that with a dynamic pedagogy:

Read more

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning