Skip to content

VCTK

44-hour English multi-speaker corpus with 110 speakers covering a wide range of UK and US accents; widely used for multi-speaker TTS and speaker adaptation research.

Recommendation

First choice for multi-speaker English TTS experiments and accent-aware speaker embedding research. The diversity of accents is its main advantage over LJSpeech. Recording quality is very clean. Hours per speaker are limited (~30 min), which constrains voice-cloning fine-tuning.

Getting the data

Obtain from the dataset homepage.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/tts-data-prep.yaml — run it with vkit docker run.