VoxKitchen logo

VoxKitchen

Turn raw speech recordings into clean, inspectable training datasets. Write one Docker-backed YAML pipeline, run it with vkit docker, and get checkpoints, reports, and exported datasets.

52 operators across 8 categories: audio processing, segmentation, augmentation, annotation (ASR/diarization/alignment/emotion), quality metrics, TTS synthesis, utility, and output packing.

Get Started

Getting Started — install, first pipeline, inspect results
Examples & Use Cases — choose a ready-made pipeline by task
Data Protocol — Recording, Supervision, Cut, CutSet, Provenance

Tutorials

Use a template to scaffold a project for your use case:

vkit init my-project --template tts       # TTS data preparation
vkit init my-project --template asr       # ASR training data
vkit init my-project --template cleaning  # Data cleaning
vkit init my-project --template speaker   # Speaker analysis

TTS Training Data — quality gate for TTS training audio: denoise, segment, transcribe, align
Speaker TTS — synthesize text with a built-in voice (kokoro, ChatTTS, CosyVoice sft)
Voice Cloning & TTS — clone a voice from a 3–10 s reference (CosyVoice zero-shot, Fish-Speech)
ASR Training Data — augment, transcribe, pack for training
Data Cleaning — quality metrics, dedup, filter
Speaker Analysis — diarize, embed, classify

Reference

Operators — all 52 operators with config and YAML examples
Dataset Catalog — dataset recipes and vkit docker download
CLI Commands — complete CLI reference
Python Tools API — standalone functions for quick tasks
Pipeline YAML — YAML schema and execution model

Quick Reference

vkit docker run --tag slim examples/pipelines/demo-no-asr.yaml --dry-run
vkit init --list-templates          # available project templates
vkit docker download --tag slim librispeech --root ./data/librispeech --subsets dev-clean
vkit docker run --tag asr pipeline.yaml --dry-run
vkit docker doctor --tag latest

VoxKitchen

Get Started

Tutorials

Reference

Quick Reference

Links