CSS10
Single-speaker speech datasets for 10 languages built from aligned public-domain LibriVox clips, intended for TTS.
- Task: tts
- Languages: de, el, es, fi, fr, hu, ja, nl, ru, zh
- Hours: 99
- Domain: audiobook
- License: see source terms
- Homepage: https://github.com/Kyubyong/css10
- Paper: https://arxiv.org/abs/1903.11269
Recommendation
A lightweight starting point for non-English single-speaker TTS, useful as a baseline or for low-resource prototyping. Per-language coverage is uneven (~10-20 h each), so it suits small or fine-tuning experiments. The license is reported inconsistently across sources — verify before redistribution.
Getting the data
Obtain from the dataset homepage.
Underlying LibriVox audio is public domain, but the repo declares Apache-2.0 and some mirrors label it CC BY-SA 4.0; confirm before commercial use.
Suggested processing
A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/tts-data-prep.yaml — run it with vkit docker run.