Emotional Speech Database (ESD)
29 h of parallel emotional speech from 20 speakers (10 English, 10 Mandarin), each reading 350 parallel utterances across 5 emotions.
- Task: tts, emotion
- Languages: en, zh
- Hours: 29
- Domain: acted emotional (parallel, bilingual)
- License: see source terms
- Homepage: https://hltsingapore.github.io/ESD/
- Paper: https://arxiv.org/abs/2105.14762
Recommendation
The go-to corpus for emotional voice conversion and cross-lingual / multi-speaker emotional TTS thanks to its parallel bilingual design. Emotion is acted, only 5 classes, and 20 speakers limit speaker diversity.
Getting the data
Obtain from the dataset homepage.
Released by NUS/SUTD for research use; the project page gives no formal license, only a citation request — treat as research-only.
Suggested processing
A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/tts-data-prep.yaml — run it with vkit docker run.