Teaching AI What We Like—Faster and Smarter

Kari Jaaskelainen

07 Nov 2025 — 1 min read

Getting AI to reflect human preferences usually means showing it lots of examples—which is slow and costly. This paper proposes a smarter path: combine the scale of RLHF (used to tune large language models) with the efficiency of Bayesian preference optimization (PBO), which actively chooses the most informative questions to ask.

What’s new: An acquisition-driven module slots into the RLHF pipeline, so the system asks better “Which do you prefer?” questions instead of random ones.
Why it matters: Fewer labels, faster learning, and better alignment with human judgments.
Tested on: (i) complex preference optimization tasks and (ii) fine-tuning large language models.
Results: Consistent gains in sample efficiency and overall performance across both settings.

Think of it like training a chef: instead of making you taste every dish, they quickly learn by asking the few questions that reveal your tastes fastest.

Paper: Efficient Reinforcement Learning from Human Feedback via Bayesian Preference Inference (Cercola, Capretti, Formentin). Read more: http://arxiv.org/abs/2511.04286v1

Paper: http://arxiv.org/abs/2511.04286v1

Register: https://www.AiFeta.com

AI MachineLearning ReinforcementLearning RLHF ActiveLearning Bayesian LLM HumanFeedback Research SampleEfficiency

Automating GDPR Compliance: A Roadmap for Companies and Law Firms

GDPR compliance is more than checkboxes. A new roadmap from the Privatech project shows how automation and machine learning can help companies and law firms assess—and even generate—privacy compliance. * Shift the focus to data processors’ real workflows: drafting policies, mapping data uses, documenting decisions. * Break compliance into machine-ready

FPGAs for Faster, Leaner Deep Learning: A Review of CNN Accelerators

Deep learning drives image search, robots, and medical scans. Most systems lean on CPUs and GPUs. This review asks: what if we run convolutional neural networks (CNNs) on FPGAs—reconfigurable chips you can tailor to the model? * Why FPGAs: custom dataflows, low latency, and strong energy efficiency—great for cameras,

Dynamic-K: Recommendations That Know When to Stop

Most apps show a fixed number of “top” items—say 10 movies or 20 products—assuming there are always enough good options. But that’s not always true: sometimes there are few relevant items, or some users are extra picky. The result? Filler recommendations. Dynamic-K flips the script. Instead of

Teaching chatbots to stop contradicting themselves (DECODE)

Teaching chatbots to stop contradicting themselves Ever had a bot say one thing, then the opposite a few turns later? This study introduces DECODE—a new task and dataset for spotting contradictions in everyday conversations, drawn from both human-human and human-bot chats. * New data beats existing natural language inference (NLI)