Expresso

High-quality multi-speaker English expressive speech at 48 kHz (11 h read + 30 h improvised) across many spontaneous expressive styles, for expressive speech resynthesis.

Task: tts, emotion
Languages: en
Hours: 40
Domain: expressive read + improvised studio
License: CC BY-NC 4.0
Homepage: https://speechbot.github.io/expresso/
Paper: https://arxiv.org/abs/2308.05725

Recommendation

Best for expressive/style-controlled TTS and discrete speech-resynthesis research where studio-clean, style-labeled English audio matters. Only 4 speakers and a non-commercial license, so it is unsuitable for commercial training and limited for speaker-diversity work.

Getting the data

Obtain from the dataset homepage.

Distributed via Meta's facebookresearch/textlesslib repo; NonCommercial.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/tts-data-prep.yaml — run it with vkit docker run.