AI
Teaching LLMs to Decide Better, One Regret at a Time
LLMs are starting to act as “agents,” making choices in dynamic settings—but they often struggle with exploration vs. exploitation and rack up high regret (missed reward vs. the best strategy in hindsight). A new approach, Iterative Regret-Minimization Fine-Tuning (Iterative RMFT), trains models to decide better by learning from their