From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones

Evidence that RL teaches genuinely new, compositional skills—beyond mere reweighting.

Does RL truly endow LLMs with new capabilities—or just reweight what’s already there? This study provides concrete evidence for genuine skill acquisition via composition. Using a controlled synthetic setup, a “skill” is defined as computing a string transformation f(x). When an LLM already knows f and g before RL, the authors show RL enables learning unseen compositions h(x)=g(f(x)), and even generalizes to compositions of more than two functions not observed during RL.

To avoid confounds like contamination, the framework allows precise control over task complexity and exposure. Surprisingly, compositional skill learned on a source task transfers to a different target task—so long as the model knows the target’s atomic skills, no compositional training is required there. Qualitative analyses reveal that RL shifts the model’s reasoning behavior, while next-token training on the same data does not yield these effects.

Implication: Build base models with fundamental atomic skills, then use RL to incentivize higher-order compositions that solve complex problems. The results clarify RL’s role in post-training: not just alignment or policy shaping, but a mechanism for acquiring advanced, generalizable skills through the structured reuse of simpler ones.

Paper: arXiv: Skill Composition in RL
Register: https://www.AiFeta.com

#ReinforcementLearning #SkillComposition #Generalization #Reasoning #PostTraining #LLM #NLP #Cognition

Read more