MoST: One Open-Source Model for Speech + Text

Kari Jaaskelainen

16 Jan 2026 — 1 min read

Meet MoST, a fully open-source AI that understands speech and text in one model. Instead of treating audio and words the same, MoST uses a Modality-Aware Mixture of Experts (MAMoE) to send each token to the right specialists.

Modality-specific experts learn the unique patterns of audio and text.
Shared experts help knowledge flow across both, boosting cross-modal skills.
Efficient training pipeline: post-train on ASR/TTS, then fine-tune on speech-text instructions — all from open datasets.

Results: MoST beats similarly sized models on ASR, TTS, audio language modeling, and spoken question answering. Ablations show routing + shared experts drive the gains.

Why it matters: a practical path to assistants that listen, read, and reply more accurately—using only open data.

Paper: https://arxiv.org/abs/2601.10272 | Code & data: https://github.com/NUS-HPC-AI-Lab/MoST

Paper: https://arxiv.org/abs/2601.10272v1

Register: https://www.AiFeta.com

#AI #SpeechAI #Multimodal #MixtureOfExperts #OpenSource #ASR #TTS #LLM #NLP #Research

MoST: One Open-Source Model for Speech + Text

Kari Jaaskelainen

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen