Skip to content

Emotional Speech Database (ESD)

29 h of parallel emotional speech from 20 speakers (10 English, 10 Mandarin), each reading 350 parallel utterances across 5 emotions.

Recommendation

The go-to corpus for emotional voice conversion and cross-lingual / multi-speaker emotional TTS thanks to its parallel bilingual design. Emotion is acted, only 5 classes, and 20 speakers limit speaker diversity.

Getting the data

Obtain from the dataset homepage.

Released by NUS/SUTD for research use; the project page gives no formal license, only a citation request — treat as research-only.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/tts-data-prep.yaml — run it with vkit docker run.