Faster, Cleaner Vocal Isolation with Latent Diffusion
AI that pulls the vocals out—fast
Ever wished you could solo a singer from any track without artifacts? This research shows how, using a latent diffusion model that separates vocals from full mixes.
- Efficient: Generates in a compact audio latent space, then decodes to sound—speeding up training and inference.
- Practical data: Trained only on pairs of mixtures and isolated vocals (no need for every instrument stem).
- Quality: Outperforms prior generative approaches and matches strong non‑generative systems on signal quality and interference removal in benchmarks.
- Robustness: Includes a noise‑robustness study of the latent encoder.
- Open tools: A modular toolkit is released for further research and creative workflows.
Paper by Genís Plaja‑Roglans, Yun‑Ning Hung, Xavier Serra, and Igor Pereira. Read more: https://arxiv.org/abs/2511.20470v1
Paper: https://arxiv.org/abs/2511.20470v1
Register: https://www.AiFeta.com
AudioAI DiffusionModels MusicTech SourceSeparation OpenData OpenSource AIResearch AudioEngineering