LLM

LLM-as-a-Judge: Can AI pick the best slate for you?

Kari Jaaskelainen

07 Nov 2025 — 1 min read

Can an LLM judge the best playlist, not just the next song?

Recommender systems often serve slates—ordered lists like your home feed or a playlist. Modeling what a person prefers across domains is hard.

This study tests Large Language Models as a 'world model' of user preferences: the LLM compares two slates and reasons which one a user would like more. The authors benchmark several LLMs on three tasks and datasets, then link performance to properties of the underlying preference function.

LLMs capture useful structure in preferences via pairwise reasoning.
Performance rises and falls with how consistent and expressive the preference signals are.
Results point to clear improvement paths for prompts, training, and evaluation.

Why it matters: LLM 'judges' could make slate recommenders more robust, handle cold starts, and generalize beyond a single domain.

Paper: http://arxiv.org/abs/2511.04541v1 — by Baptiste Bonin, Maxime Heuillet, and Audrey Durand.

Paper: http://arxiv.org/abs/2511.04541v1

Register: https://www.AiFeta.com

LLM RecommenderSystems SlateRecommendation Personalization AI MachineLearning WorldModels IR Research

LLM-as-a-Judge: Can AI pick the best slate for you?

Kari Jaaskelainen

Can an LLM judge the best playlist, not just the next song?

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen