AI that Transcribes Drums—No Paired Audio Required

Kari Jaaskelainen

15 Jan 2026 — 1 min read

AI that transcribes drums—no paired audio required

Most drum-transcription AIs need huge, matched audio–MIDI datasets. Those are scarce. Synthetic stand‑ins often sound cheap, creating a domain gap.

We flip the script. With a semi‑supervised pipeline, we automatically curate a large, diverse library of high‑quality one‑shot drum samples from unlabeled audio. Then we render realistic drum tracks from MIDI only and train a sequence‑to‑sequence model on this data.

High‑fidelity, diverse drum timbres—no manual labeling
Trained from MIDI + curated one‑shots (no paired audio)
New state of the art on ENST and MDB, beating fully supervised and prior synthetic‑data methods

Why it matters: more accurate drum transcription for music search, practice apps, and production tools—at a fraction of the data cost.

Paper: https://arxiv.org/abs/2601.09520 • Code: https://github.com/pier-maker92/ADT_STR

Paper: https://arxiv.org/abs/2601.09520v1

Register: https://www.AiFeta.com

#AI #Audio #MusicTech #Drums #MachineLearning #DeepLearning #MIDI #OpenSource

AI that Transcribes Drums—No Paired Audio Required

Kari Jaaskelainen

AI that transcribes drums—no paired audio required

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen