Skip to content

Emilia

Large-scale multilingual in-the-wild speech dataset designed for expressive and diverse TTS training, covering 6 languages.

Recommendation

Best choice when you need expressive, diverse multi-lingual TTS training data that goes beyond clean audiobook recordings. Collected from in-the-wild audio so prosody and speaking style are varied — ideal for natural-sounding TTS. Check the source terms; access requires registration.

Getting the data

Obtain from the dataset homepage.

Request access via the HuggingFace repository page. The dataset is available in processed (Emilia) and unprocessed (Emilia-Pipe) variants; use the processed variant unless you are doing your own quality filtering.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/tts-data-prep.yaml — run it with vkit docker run.