MUSAN

109-hour corpus of music, speech, and environmental noise designed for data augmentation in speech and speaker recognition experiments.

Task: augmentation
Languages: en
Hours: 109
Domain: music, speech, noise
License: CC BY 4.0
Homepage: https://www.openslr.org/17
Paper: https://arxiv.org/abs/1510.08484

Recommendation

The standard noise/music augmentation bank for speaker verification and robust ASR training. Combine with RIR-based room simulation for a full augmentation pipeline. Small enough to keep fully in RAM during training. No VoxKitchen pipeline template — augmentation is embedded in task-specific pipelines such as examples/pipelines/noise-augment.yaml.

Getting the data

Downloadable via VoxKitchen (musan, source: openslr, size: 10.3 GB):

vkit docker download --tag slim musan --root ./data/musan

Subsets: musan.