MoST: One Open-Source Model for Speech + Text

MoST: One Open-Source Model for Speech + Text

Meet MoST, a fully open-source AI that understands speech and text in one model. Instead of treating audio and words the same, MoST uses a Modality-Aware Mixture of Experts (MAMoE) to send each token to the right specialists.

  • Modality-specific experts learn the unique patterns of audio and text.
  • Shared experts help knowledge flow across both, boosting cross-modal skills.
  • Efficient training pipeline: post-train on ASR/TTS, then fine-tune on speech-text instructions — all from open datasets.

Results: MoST beats similarly sized models on ASR, TTS, audio language modeling, and spoken question answering. Ablations show routing + shared experts drive the gains.

Why it matters: a practical path to assistants that listen, read, and reply more accurately—using only open data.

Paper: https://arxiv.org/abs/2601.10272 | Code & data: https://github.com/NUS-HPC-AI-Lab/MoST

Paper: https://arxiv.org/abs/2601.10272v1

Register: https://www.AiFeta.com

#AI #SpeechAI #Multimodal #MixtureOfExperts #OpenSource #ASR #TTS #LLM #NLP #Research

Read more