Robots That Plan Long, Multi‑Step Tasks Using General AI

Kari Jaaskelainen

04 Nov 2025 — 1 min read

Teaching robots to handle real-world chores—no special training required

This research shows a new way for robots to complete long, multi-step tasks by combining off‑the‑shelf foundation models (the same kind powering today’s AI) with a constantly updated “scene graph”—a smart map of objects and their relationships.

Here’s the idea: foundation models handle what the robot sees and understands (vision and language), while a general reasoning model decides the sequence of actions. The scene graph ties it together, tracking where things are and how they change so the robot can plan reliably over many steps without forgetting context.

Multimodal perception from existing AI models
General-purpose reasoning for robust task sequencing
Dynamic scene graphs for spatial awareness and consistency

Tested on tabletop manipulation, the framework highlights a path to build capable robot systems directly on top of today’s off-the-shelf AI—no domain-specific training needed.

Paper by Sushil Samuel Dinesh and Shinkyu Park. Read more: http://arxiv.org/abs/2510.27558v1

Paper: http://arxiv.org/abs/2510.27558v1

Register: https://www.AiFeta.com

#AI #Robotics #RobotLearning #FoundationModels #SceneGraphs #Manipulation #Research

Automating GDPR Compliance: A Roadmap for Companies and Law Firms

GDPR compliance is more than checkboxes. A new roadmap from the Privatech project shows how automation and machine learning can help companies and law firms assess—and even generate—privacy compliance. * Shift the focus to data processors’ real workflows: drafting policies, mapping data uses, documenting decisions. * Break compliance into machine-ready

FPGAs for Faster, Leaner Deep Learning: A Review of CNN Accelerators

Deep learning drives image search, robots, and medical scans. Most systems lean on CPUs and GPUs. This review asks: what if we run convolutional neural networks (CNNs) on FPGAs—reconfigurable chips you can tailor to the model? * Why FPGAs: custom dataflows, low latency, and strong energy efficiency—great for cameras,

Dynamic-K: Recommendations That Know When to Stop

Most apps show a fixed number of “top” items—say 10 movies or 20 products—assuming there are always enough good options. But that’s not always true: sometimes there are few relevant items, or some users are extra picky. The result? Filler recommendations. Dynamic-K flips the script. Instead of

Teaching chatbots to stop contradicting themselves (DECODE)

Teaching chatbots to stop contradicting themselves Ever had a bot say one thing, then the opposite a few turns later? This study introduces DECODE—a new task and dataset for spotting contradictions in everyday conversations, drawn from both human-human and human-bot chats. * New data beats existing natural language inference (NLI)