Skip to content

Expresso

High-quality multi-speaker English expressive speech at 48 kHz (11 h read + 30 h improvised) across many spontaneous expressive styles, for expressive speech resynthesis.

Recommendation

Best for expressive/style-controlled TTS and discrete speech-resynthesis research where studio-clean, style-labeled English audio matters. Only 4 speakers and a non-commercial license, so it is unsuitable for commercial training and limited for speaker-diversity work.

Getting the data

Obtain from the dataset homepage.

Distributed via Meta's facebookresearch/textlesslib repo; NonCommercial.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/tts-data-prep.yaml — run it with vkit docker run.