How language models can learn without new labels

Kari Jaaskelainen

21 Jan 2026 — 2 min read

Researchers have a new explanation for why some AI systems get better without human feedback. The work matters because it may cut the need for expensive labeled data, but also because self-improving systems can lock in their own mistakes if left unchecked.

Why this is being discussed now

A team from the University of Maryland has posted a study on arXiv that unifies several recent tricks for improving language models without extra supervision. Methods with names like debate, bootstrapping and internal consistency have shown gains, sometimes close to models trained with full answers. Until now, no clear theory tied these methods together.

Why AI might act “coherently”

The authors argue that all these methods push a model toward the most coherent mapping from context to answer. In practice, that means choosing responses that fit together across different prompts and are easy to predict from one another. They show this is the same as preferring the shortest adequate explanation (a data-compression idea known as “minimum description length”). In plain terms: among many ways to answer, pick the one that makes the whole pattern simpler.

A concrete example

In a self-debate setup, a model generates multiple lines of reasoning and compares them. It then selects the answer that best agrees with its own checks across steps and variants. No new labels are added; the model improves by favoring the version that keeps its story straight across attempts.

Main risk: speed and scale

The same process that rewards internal agreement can also reinforce errors. If a model starts with a bias, “coherence” may amplify it, making the wrong pattern look simple and consistent. At scale, this can produce confident but incorrect outputs, especially in areas where the model’s initial knowledge is thin.

What the authors propose

The study offers a yardstick: measure and control how strongly the model is pushed toward simpler patterns. It suggests mixing in small amounts of verified data, capping the influence of the simplicity rule, and auditing results with external checks. These steps act as brakes and help predict when self-improvement will help or fail.

In sum

The theory explains why feedback-free methods sometimes match supervised training and when they should not be trusted. Used carefully, coherence can be a useful guide. Used blindly, it can make models confidently wrong.

In a nutshell

Self-improving language models work by favoring answers that make their overall behavior simpler and more consistent, which helps—until it amplifies initial mistakes.

What to understand

“Coherence” links many self-improvement tricks to one core idea: prefer the simplest consistent pattern.
This can rival supervised training in some cases, but it can also entrench biases and errors.
Practical safeguards include small amounts of labeled data, limits on the simplicity push, and outside audits.

Paper: https://arxiv.org/abs/2601.13566v1

Register: https://www.AiFeta.com

ai language-models research arxiv nlp machinelearning

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Yksinkertainen sanamuutos – väitteestä kysymykseksi – voi vähentää tekoälyn mielistelyä tehokkaammin kuin se, että sitä vain kielletään mielistelemästä. Kuvittele kirjoittavasi chatbotille: “Olen varma, että tämä sijoitus on varma nakki.” Toinen tapa olisi kysyä: “Onko tämä sijoitus varma nakki?” Ero on pieni, mutta sillä näyttää olevan väliä. Kun kone kuulee julistuksen, se nyökkää

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Kuvittele tutun chat-ikkunan vilkkuva kursori. Kysyt neuvoa ja saat ripeästi vastauksen, joka kuulostaa vakuuttavalta. Myöhemmin selviää, että se oli väärin. Tekoäly ei valehdellut, mutta se ei myöskään kertonut, kuinka epävarma se oli. Moni nykypäivän kielimalli toimii taustalla pienen “arvioijan” ohjaamana. Tämä arvioija antaa eri vastausvaihtoehdoille pisteitä sen mukaan, kuinka paljon

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Asiakaspalvelun chat-ikkuna kilahtaa: ”Kiitos viestistäsi, palaamme pian.” Sama lause toistuu tuhansia kertoja päivässä. Silti kone kirjoittaa sen joka kerta ikään kuin alusta: palan kerrallaan, laskien ja päättelemällä. Se on hidasta työlle, jossa sisällöt eivät juuri vaihtele. Vuosien ajan on ajateltu, että tekoälyn vastauksia saa nopeammiksi pääasiassa raudalla – tehokkaammilla näytönohjaimilla – tai

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen

Puhelimen muotokuva-asento korostaa kasvoja pehmentämällä taustan. Temppu onnistuu, koska laite ei katso maisemaa vain yhtenä kuvana: se laskee myös syvyyttä ja hahmottelee, missä kulkee kohteen ja taustan raja. Meille ihmisille nämä kaikki ovat sama näkymä. Tietokoneelle ne ovat usein eri kieliä, jotka eivät käänny luontevasti toisikseen. Vallitseva ajatus on ollut,