Scaling Generalist Data-Analytic Agents

Kari Jaaskelainen

30 Sep 2025 — 1 min read

DataMind: a scalable data synthesis and agent training recipe powering open generalist analytics agents

Generalist data‑analytic agents promise automated, end‑to‑end insights across messy, large, and diverse files. But open models often falter on long‑horizon, code‑based reasoning and multi‑turn tool use. DataMind introduces a full stack to fix that: a data synthesis pipeline, stability‑oriented rollouts, and a training objective that blends SFT and RL for robust skill acquisition.

Four key ingredients drive the recipe: (1) a fine‑grained task taxonomy paired with recursive easy‑to‑hard composition to scale difficulty and diversity; (2) knowledge‑augmented trajectory sampling plus model‑ and rule‑based filtering to ensure quality; (3) a dynamically adjustable objective that mixes SFT and RL losses; and (4) a memory‑frugal, stable, code‑centric multi‑turn rollout framework for agentic training.

The team releases DataMind‑12K, a curated trajectory set spanning domains, task types, and data formats. Trained on it, DataMind‑14B achieves an average 71.16% across multiple data analysis benchmarks, outperforming strong proprietary baselines such as DeepSeek‑V3.1 and GPT‑5 reported by the authors. DataMind‑7B tops open‑source peers at 68.10%.

Why it matters: building open, reproducible, and strong data‑analytic agents requires both high‑quality synthetic data and stable agentic training. DataMind offers an actionable blueprint and artifacts (dataset and models) to accelerate community progress.

Who should care: data science teams, analytics platform builders, and researchers pursuing agentic code execution over real‑world, heterogeneous datasets.

Paper: arXiv: Scaling Generalist Data‑Analytic Agents
Register: AiFeta

#AIAgents #DataAnalysis #ToolUse #ReinforcementLearning #SFT #OpenSource #CodeAgents

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Evidence that RL teaches genuinely new abilities: compositional skills emerge and transfer across tasks Does RL merely reweight what an LLM already knows—or can it teach genuinely new skills? This paper offers concrete evidence for the latter. Using a controlled, synthetic framework, the authors define “skills” as string transformation

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

A structured 80k instruction–image corpus spanning 11 domains and 51 subtasks to train unified visual editors Unified models for image generation and editing hit a data ceiling: existing corpora emphasize basic manipulations but miss real‑world complexity. OpenGPT‑4o‑Image tackles this with a hierarchical task taxonomy and automated

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

ROVER replaces PPO loops with uniform‑policy Q‑values—boosting quality and diversity in math reasoning Popular RLVR methods for LLM reasoning lean on generalized policy iteration (e.g., PPO/GRPO), but suffer instability and diversity collapse. This paper reframes math RLVR as a specialized finite‑horizon MDP with deterministic

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

A dynamic, self‑paced curriculum that restructures problems to match model ability in RLVR Online RL with Verifiable Rewards (RLVR) has boosted LLM reasoning—but most methods treat all problems equally, wasting effort on solved items and flailing on those beyond current capability. CLPO fixes that with a dynamic pedagogy:

Read more

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning