REMA: A Unified Reasoning Manifold Framework for Interpreting Large Language Model

Kari Jaaskelainen

29 Sep 2025 — 1 min read

Diagnosing LLM reasoning failures via geometric deviation from a manifold of success

Why do large language models succeed on some reasoning chains and fail on others? REMA introduces a compelling geometric perspective: the Reasoning Manifold—an emergent, low-dimensional structure formed by internal representations associated with correct multi-step reasoning. Errors, in this view, are measurable deviations from that manifold.

REMA operationalizes this insight in two steps. First, it quantifies how far an erroneous sample’s internal states deviate from the manifold of correct samples using a k-nearest-neighbors distance to an approximated manifold. This serves as a unified failure signal across tasks and models. Second, it localizes where the failure begins by tracking the deviation layer-by-layer and comparing it against a baseline of internal fluctuations observed in correct reasoning, revealing the divergence point where the chain starts to go off-track.

Unified metric: a geometry-driven signal that separates correct from erroneous trajectories.
Layer-wise localization: identifies the earliest divergence layers for targeted diagnosis.
General across modalities: validated on diverse language and multimodal tasks and models.

Experiments show the manifold is low-dimensional and that correct versus incorrect representations are highly separable, enabling actionable diagnosis. For practitioners, REMA offers new tools to monitor, predict, and potentially intervene in reasoning behavior—moving beyond output-only evaluation to introspect the model’s internal computational pathways.

By tying abstract reasoning failures to concrete geometric deviations, REMA provides a principled bridge between interpretability research and practical reliability engineering for next-generation LLMs.

Paper: http://arxiv.org/abs/2509.22518v1
Register: https://www.AiFeta.com

#LLM #Interpretability #Reasoning #EmbeddingManifold #AIAlignment #Diagnostics #AIResearch

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Evidence that RL teaches genuinely new abilities: compositional skills emerge and transfer across tasks Does RL merely reweight what an LLM already knows—or can it teach genuinely new skills? This paper offers concrete evidence for the latter. Using a controlled, synthetic framework, the authors define “skills” as string transformation

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

A structured 80k instruction–image corpus spanning 11 domains and 51 subtasks to train unified visual editors Unified models for image generation and editing hit a data ceiling: existing corpora emphasize basic manipulations but miss real‑world complexity. OpenGPT‑4o‑Image tackles this with a hierarchical task taxonomy and automated

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

ROVER replaces PPO loops with uniform‑policy Q‑values—boosting quality and diversity in math reasoning Popular RLVR methods for LLM reasoning lean on generalized policy iteration (e.g., PPO/GRPO), but suffer instability and diversity collapse. This paper reframes math RLVR as a specialized finite‑horizon MDP with deterministic

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning

A dynamic, self‑paced curriculum that restructures problems to match model ability in RLVR Online RL with Verifiable Rewards (RLVR) has boosted LLM reasoning—but most methods treat all problems equally, wasting effort on solved items and flailing on those beyond current capability. CLPO fixes that with a dynamic pedagogy:

Read more

From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing

Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards

CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning