Skip to content

VoxKitchen logo

VoxKitchen

Turn raw speech recordings into clean, inspectable training datasets. Write one Docker-backed YAML pipeline, run it with vkit docker, and get checkpoints, reports, and exported datasets.

52 operators across 8 categories: audio processing, segmentation, augmentation, annotation (ASR/diarization/alignment/emotion), quality metrics, TTS synthesis, utility, and output packing.

Get Started

Tutorials

Use a template to scaffold a project for your use case:

vkit init my-project --template tts       # TTS data preparation
vkit init my-project --template asr       # ASR training data
vkit init my-project --template cleaning  # Data cleaning
vkit init my-project --template speaker   # Speaker analysis

Reference

Quick Reference

vkit docker run --tag slim examples/pipelines/demo-no-asr.yaml --dry-run
vkit init --list-templates          # available project templates
vkit docker download --tag slim librispeech --root ./data/librispeech --subsets dev-clean
vkit docker run --tag asr pipeline.yaml --dry-run
vkit docker doctor --tag latest