Emilia
Large-scale multilingual in-the-wild speech dataset designed for expressive and diverse TTS training, covering 6 languages.
- Task: tts, multilingual
- Languages: multi
- Domain: in-the-wild (diverse)
- License: see source terms
- Homepage: https://huggingface.co/datasets/amphion/Emilia-Dataset
- Paper: https://arxiv.org/abs/2407.05361
Recommendation
Best choice when you need expressive, diverse multi-lingual TTS training data that goes beyond clean audiobook recordings. Collected from in-the-wild audio so prosody and speaking style are varied — ideal for natural-sounding TTS. Check the source terms; access requires registration.
Getting the data
Obtain from the dataset homepage.
Request access via the HuggingFace repository page. The dataset is available in processed (Emilia) and unprocessed (Emilia-Pipe) variants; use the processed variant unless you are doing your own quality filtering.
Suggested processing
A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/tts-data-prep.yaml — run it with vkit docker run.