AI
On Evaluating LLM Alignment by Evaluating LLMs as Judges
How do we know if an AI model is truly aligned with human preferences—helpful, honest, safe, and instruction-following? This paper explores a surprisingly effective shortcut: judge the judges. Instead of grading a model’s open-ended answers (which needs lots of human effort or very strong AI judges), the authors