VoxKitchen
Turn raw speech recordings into clean, inspectable training datasets. Write one
Docker-backed YAML pipeline, run it with vkit docker, and get checkpoints,
reports, and exported datasets.
52 operators across 8 categories: audio processing, segmentation, augmentation, annotation (ASR/diarization/alignment/emotion), quality metrics, TTS synthesis, utility, and output packing.
Get Started
- Getting Started — install, first pipeline, inspect results
- Examples & Use Cases — choose a ready-made pipeline by task
- Data Protocol — Recording, Supervision, Cut, CutSet, Provenance
Tutorials
Use a template to scaffold a project for your use case:
vkit init my-project --template tts # TTS data preparation
vkit init my-project --template asr # ASR training data
vkit init my-project --template cleaning # Data cleaning
vkit init my-project --template speaker # Speaker analysis
- TTS Training Data — quality gate for TTS training audio: denoise, segment, transcribe, align
- Speaker TTS — synthesize text with a built-in voice (kokoro, ChatTTS, CosyVoice sft)
- Voice Cloning & TTS — clone a voice from a 3–10 s reference (CosyVoice zero-shot, Fish-Speech)
- ASR Training Data — augment, transcribe, pack for training
- Data Cleaning — quality metrics, dedup, filter
- Speaker Analysis — diarize, embed, classify
Reference
- Operators — all 52 operators with config and YAML examples
- Dataset Catalog — dataset recipes and
vkit docker download - CLI Commands — complete CLI reference
- Python Tools API — standalone functions for quick tasks
- Pipeline YAML — YAML schema and execution model
Quick Reference
vkit docker run --tag slim examples/pipelines/demo-no-asr.yaml --dry-run
vkit init --list-templates # available project templates
vkit docker download --tag slim librispeech --root ./data/librispeech --subsets dev-clean
vkit docker run --tag asr pipeline.yaml --dry-run
vkit docker doctor --tag latest
Links
- Example pipelines
- GitHub
- License (Apache 2.0)