VCTK

44-hour English multi-speaker corpus with 110 speakers covering a wide range of UK and US accents; widely used for multi-speaker TTS and speaker adaptation research.

Task: tts, speaker
Languages: en
Hours: 44
Domain: read sentences (accent-diverse)
License: CC BY 4.0
Homepage: https://datashare.ed.ac.uk/handle/10283/3443

Recommendation

First choice for multi-speaker English TTS experiments and accent-aware speaker embedding research. The diversity of accents is its main advantage over LJSpeech. Recording quality is very clean. Hours per speaker are limited (~30 min), which constrains voice-cloning fine-tuning.

Getting the data

Obtain from the dataset homepage.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/tts-data-prep.yaml — run it with vkit docker run.