StepORLM: A Self-Evolving Framework With Generative Process Supervision For Operations Research Language Models
Co-evolving a policy LLM and a generative process reward model for OR Solving Operations Research problems with LLMs demands more than final-answer rewards. StepORLM introduces generative process supervision that evaluates the entire modeling and reasoning pipeline. At its core is a co-evolution loop: a policy model learns to solve OR