Faster, Cleaner Vocal Isolation with Latent Diffusion

Faster, Cleaner Vocal Isolation with Latent Diffusion

AI that pulls the vocals out—fast

Ever wished you could solo a singer from any track without artifacts? This research shows how, using a latent diffusion model that separates vocals from full mixes.

  • Efficient: Generates in a compact audio latent space, then decodes to sound—speeding up training and inference.
  • Practical data: Trained only on pairs of mixtures and isolated vocals (no need for every instrument stem).
  • Quality: Outperforms prior generative approaches and matches strong non‑generative systems on signal quality and interference removal in benchmarks.
  • Robustness: Includes a noise‑robustness study of the latent encoder.
  • Open tools: A modular toolkit is released for further research and creative workflows.

Paper by Genís Plaja‑Roglans, Yun‑Ning Hung, Xavier Serra, and Igor Pereira. Read more: https://arxiv.org/abs/2511.20470v1

Paper: https://arxiv.org/abs/2511.20470v1

Register: https://www.AiFeta.com

AudioAI DiffusionModels MusicTech SourceSeparation OpenData OpenSource AIResearch AudioEngineering

Read more