Faster, Cleaner Vocal Tracks with Latent Diffusion
Ever wanted studio‑clean vocals from any song? This research introduces a generative system that separates singing voices from full mixes using a latent diffusion model.
How it works
- Generates audio in a compact latent space, then decodes it back to sound—making training and inference efficient.
- Trained only on pairs of mixtures and isolated vocals (no need for every instrument stem).
- Trained using open datasets.
Why it matters
- Outperforms prior generative separators and matches strong non‑generative baselines on signal quality and interference removal.
- Faster inference aligns with creative workflows for producers, DJs, and educators.
- Noise‑robust latent encoder offers resilience in real‑world scenarios.
- Released as a modular toolkit to spur further research and tooling.
Paper: https://arxiv.org/abs/2511.20470v1
Paper: https://arxiv.org/abs/2511.20470v1
Register: https://www.AiFeta.com
audioai musictech diffusionmodels sourceseparation vocals musicproduction opensource opendata latentdiffusion