Scaling Generalist Data-Analytic Agents

Kari Jaaskelainen

30 Sep 2025 — 1 min read

DataMind: a scalable recipe and dataset to train open, code-capable data-analytic agents.

Open-source data-analytic agents lag behind proprietary systems due to data scarcity, brittle training, and unstable multi-turn code execution. DataMind addresses these gaps with a full-stack recipe—data synthesis, curriculum, filtering, objectives, and rollout—to train generalist agents that parse diverse data formats and reason across long horizons.

Key ingredients:

Task synthesis at scale: A fine-grained taxonomy with recursive easy-to-hard composition expands diversity and difficulty of analytical queries.
Quality trajectories: Knowledge-augmented sampling followed by model- and rule-based filtering yields clean, instructive demonstrations.
Balanced objectives: A dynamically adjustable blend of SFT and RL stabilizes learning while pushing capability.
Stable multi-turn code rollout: Memory-frugal execution improves reliability in code-based tool use.

Deliverables include DataMind-12K, a high-quality trajectory set spanning domains, task types, and file formats. Models trained on it achieve standout results: DataMind-14B reaches 71.16% average across data-analysis benchmarks, outperforming strong proprietary baselines (e.g., DeepSeek‑V3.1 and GPT‑5), while DataMind-7B leads among open models at 68.10%. The authors will release DataMind-12K and model checkpoints to accelerate community progress.

Why it matters: Real-world analytics demands long-horizon reasoning, tool use, and robustness across messy data—precisely what DataMind targets. With an open, reproducible pipeline, it provides a practical path to scalable, capable, and transparent analytic agents.

Paper: arXiv: DataMind
Register: https://www.AiFeta.com

#Agents #DataAnalytics #CodeAgents #SFT #ReinforcementLearning #OpenSource #ToolUse #LLM

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Evidence that RL teaches genuinely new abilities: compositional skills emerge and transfer across tasks Does RL merely reweight what an LLM already knows—or can it teach genuinely new skills? This paper offers concrete evidence for the latter. Using a controlled, synthetic framework, the authors define “skills” as string transformation

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

A structured 80k instruction–image corpus spanning 11 domains and 51 subtasks to train unified visual editors Unified models for image generation and editing hit a data ceiling: existing corpora emphasize basic manipulations but miss real‑world complexity. OpenGPT‑4o‑Image tackles this with a hierarchical task taxonomy and automated

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

ROVER replaces PPO loops with uniform‑policy Q‑values—boosting quality and diversity in math reasoning Popular RLVR methods for LLM reasoning lean on generalized policy iteration (e.g., PPO/GRPO), but suffer instability and diversity collapse. This paper reframes math RLVR as a specialized finite‑horizon MDP with deterministic

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

A dynamic, self‑paced curriculum that restructures problems to match model ability in RLVR Online RL with Verifiable Rewards (RLVR) has boosted LLM reasoning—but most methods treat all problems equally, wasting effort on solved items and flailing on those beyond current capability. CLPO fixes that with a dynamic pedagogy:

Read more

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning