Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning
From tabula rasa to transparent mastery: an LLM agent that induces rules and writes its own playbook
Most deep RL agents master games by amassing opaque experience. CEL—Cogito, ergo ludo—takes a different path: it learns to play by reasoning and planning. Powered by a Large Language Model, CEL explicitly infers environment rules and synthesizes a strategic playbook from raw episodes, starting with no prior knowledge beyond the action set.
After each episode, CEL reflects on the entire trajectory via two concurrent processes. Rule Induction refines a language‑based model of environment dynamics. Strategy and Playbook Summarization distills experiences into actionable guidance for future decisions. This cycle of interaction and reflection yields an agent that not only improves, but can also explain what it believes about the world and why its policy should work.
Evaluated on Minesweeper, Frozen Lake, and Sokoban—diverse grid‑world tasks with sparse rewards—the agent autonomously discovers fundamental mechanics and develops effective policies. Ablations confirm the necessity of iterative reflection: the loop is what sustains learning, bridging the gap between raw experience and explicit, reusable knowledge.
Why it matters: CEL points toward interpretable agents that learn generalizable abstractions, not just policies. By externalizing knowledge in language, teams can inspect, debug, and adapt strategies—crucial for safety and transfer.
Who should care: RL researchers exploring symbolic‑neural hybrids, builders of transparent decision‑making systems, and practitioners who need verifiable reasoning in safety‑critical environments.
Paper: arXiv: Cogito, Ergo Ludo
Register: AiFeta
#RL #LLMAgents #Planning #Explainability #InterpretableAI #GameAI #Reasoning