From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones

Evidence that RL teaches genuinely new abilities: compositional skills emerge and transfer across tasks

Does RL merely reweight what an LLM already knows—or can it teach genuinely new skills? This paper offers concrete evidence for the latter. Using a controlled, synthetic framework, the authors define “skills” as string transformation functions (e.g., f, g). If a model knows f and g, can RL teach it to perform the unseen composition h(x)=g(f(x))? The answer is yes—without supervised next‑token training on h.

More, the compositional capability generalizes: models trained to compose two functions can extrapolate to compositions of more than two, even when such compositions never appear during RL. The authors also show transfer: compositional skill acquired on a source task can carry over to a target task that shares only the atomic skills, not the compositions. Qualitative analyses indicate RL changes the model’s reasoning behavior, while next‑token training on the same data does not induce these effects.

Why it matters: composition is a cornerstone of human cognitive skill acquisition. Demonstrating that RL can induce compositional abilities in LLMs suggests a path to building base models with atomic skills, then using RL to incentivize advanced, generalizable skills needed for complex problem solving.

Who should care: researchers probing the mechanisms of LLM learning, and practitioners designing post‑training pipelines that prioritize transfer and generalization over rote pattern matching.

Paper: arXiv: Compositional Skills via RL
Register: AiFeta

#ReinforcementLearning #LLM #Compositionality #Generalization #SkillAcquisition #Reasoning #AIResearch

Read more