AI
Teaching AI What We Like—Faster and Smarter
Getting AI to reflect human preferences usually means showing it lots of examples—which is slow and costly. This paper proposes a smarter path: combine the scale of RLHF (used to tune large language models) with the efficiency of Bayesian preference optimization (PBO), which actively chooses the most informative questions