CLI Reference
VoxKitchen provides the vkit command-line tool.
Commands
vkit init
Scaffold a new pipeline project.
vkit init my-project # Empty template
vkit init my-project --template tts # TTS data preparation
vkit init my-project --template asr # ASR training data
vkit init my-project --template cleaning # Data cleaning
vkit init my-project --template speaker # Speaker analysis
vkit init --list-templates # Show available templates
| Flag | Meaning |
|---|---|
--template, -t |
Project template (tts, asr, cleaning, speaker). |
--list-templates |
Print templates and exit. |
vkit run (container entrypoint)
Execute a pipeline in the current Python environment. This is the
container entrypoint used by VoxKitchen images. Most host users should use
vkit docker run <yaml>, which supplies the runtime image and Docker mounts.
# Host usage; these flags are forwarded to the image entrypoint:
vkit docker run pipeline.yaml # Run full pipeline
vkit docker run pipeline.yaml --dry-run # Validate only
vkit docker run pipeline.yaml --resume-from vad # Resume from a stage
vkit docker run pipeline.yaml --stop-at asr # Stop after a stage
vkit docker run pipeline.yaml --keep-intermediates # Keep derived audio
vkit docker run pipeline.yaml --num-gpus 2 # Override GPU count
vkit docker run pipeline.yaml --num-workers 8 # Override CPU workers
vkit docker run pipeline.yaml --work-dir ./work/run1 # Override work_dir
| Flag | Meaning |
|---|---|
--dry-run |
Parse + validate the pipeline, resolve the stage plan, exit without executing. |
--resume-from STAGE |
Force-resume from STAGE regardless of existing checkpoints. |
--stop-at STAGE |
Stop after STAGE completes. |
--keep-intermediates |
Disable GC; keep every stage's derived audio on disk. |
--num-gpus N |
Override the pipeline YAML's num_gpus. |
--num-workers N |
Override num_cpu_workers. |
--work-dir PATH |
Override the pipeline YAML's work_dir. |
vkit validate
Check YAML syntax, operator references, per-operator arg schemas, and the recommended Docker image without executing.
vkit validate pipeline.yaml
vkit download (current-env helper)
Download a dataset using its recipe in the current environment. For the
Docker-first user path, use vkit docker download so recipe-specific
dependencies come from the image.
vkit docker download --tag slim librispeech --root ./data/librispeech --subsets dev-clean,test-clean
vkit docker download --tag slim aishell --root ./data/aishell
vkit docker download --tag slim fleurs --root ./data/fleurs --subsets en_us,zh_cn
| Flag | Meaning |
|---|---|
--root PATH |
Directory to download into (required). |
--subsets LIST |
Comma-separated subset names. Recipe-specific; see Recipes & Download. |
vkit ingest
Build a CutSet manifest from a data source — standalone, outside a pipeline.
Most users let vkit docker run do this through the pipeline ingest block;
vkit ingest is useful for one-off manifest prep.
vkit ingest --source dir --root ./data/audio --out cuts.jsonl.gz
vkit ingest --source recipe --recipe librispeech --root ./data/librispeech --out cuts.jsonl.gz
vkit ingest --source manifest --path input.jsonl.gz --out merged.jsonl.gz
vkit ingest --source dir --root ./data/audio --out cuts.jsonl.gz --no-recursive
| Flag | Source | Meaning |
|---|---|---|
--source |
all | dir, manifest, or recipe. |
--out |
all | Output cuts.jsonl.gz path (required). |
--root |
dir, recipe |
Dataset root directory. |
--path |
manifest |
Path to an input cuts.jsonl.gz. |
--recipe |
recipe |
Recipe name (librispeech, aishell, commonvoice, fleurs). |
--subsets |
recipe |
Comma-separated subset names. |
--recursive / --no-recursive |
dir |
Recurse into subdirectories (default: recurse). |
vkit inspect
Inspect pipeline results and cut data. Four subcommands:
vkit inspect cuts work/01_pack/cuts.jsonl.gz # CutSet statistics
vkit inspect run work/ # Per-stage summary + timing
vkit inspect trace utt-001 --in work/ # Provenance chain for one cut
vkit inspect errors work/ # Per-stage error entries
| Subcommand | Argument | Purpose |
|---|---|---|
cuts <path> |
A cuts.jsonl.gz |
Print CutSet-level stats (duration, sample rate, metric histograms). |
run <work_dir> |
Pipeline work dir | Table of stage name, cut count, duration, success marker. |
trace <cut_id> --in <work_dir> |
Cut id + work dir | Walk Provenance parent links to show where a cut came from. |
errors <work_dir> |
Pipeline work dir | Dump per-stage _errors.jsonl entries (cuts that failed). |
vkit operators
List and inspect operators.
vkit operators # List all operators (grouped by category)
vkit operators --category quality # List only operators in one category
vkit operators search noise # Find operators whose name or description matches "noise"
vkit operators show silero_vad # Show config fields + YAML example for an operator
search matches case-insensitively against the operator name and the first
line of its docstring. It exits with code 1 when nothing matches, so shell
scripts can branch on no-result.
Valid --category values are: basic, segment, augment, annotate,
quality, synthesize, pack, noop.
vkit schema
Generate JSON Schemas for YAML editor integration.
vkit schema export # → ./pipeline.schema.json
vkit schema export --out docs/schemas/pipeline.schema.json # custom path
The output is consumed by YAML language servers (VS Code, Neovim, JetBrains) so
users get autocompletion and inline validation while editing pipeline.yaml.
vkit init already writes the right # yaml-language-server: $schema=…
directive at the top of every scaffolded pipeline. See
Pipeline JSON Schema for editor setup.
vkit recipes
List dataset recipes (the entities behind vkit docker download and ingest: source=recipe).
vkit recipes
Output is a table with name, download mechanism (openslr, keithito,
HuggingFace, or manual), compressed download size, and a one-line
description. The Size column shows a single value for single-archive
datasets and a range for multi-subset datasets (299 MB - 28.5 GB)
so you can compare before downloading. Manual / HuggingFace recipes
render as a dash.
To actually download, use vkit docker download --tag slim <name> --root ./data/<name>;
to reference inside a pipeline, use
ingest: { source: recipe, recipe: <name>, args: { root: <dir> } }.
Recipe-specific subset names are listed in Recipes & Download.
vkit datasets
Browse the dataset catalog from the terminal — same data that's shown on the docs site dataset catalog, without leaving your shell. Three modes:
vkit datasets # all 60 entries as a table
vkit datasets --task asr --language zh # filter
vkit datasets --recipe-only # only downloadable-via-VoxKitchen
vkit datasets --query libri # substring across id/name/summary
vkit datasets show librispeech # full record (one entry)
vkit datasets search 'code-switch' # substring search across all fields
| Flag | Meaning |
|---|---|
--task, -t |
Filter by task tag (asr / tts / speaker / multilingual / emotion / augmentation). |
--language, -l |
Filter by ISO language code (en / zh / multi / ja / ...). |
--recipe-only |
Only show entries downloadable via vkit docker download. |
--query, -q |
Substring match (case-insensitive) across id / name / summary. |
Filters compose with AND. The show subcommand prints a Rich panel with
all fields (license, homepage, paper, recipe hint, recommended pipeline,
summary, recommendation, notes); when the entry is recipe-backed the
panel surfaces the exact vkit docker download invocation.
vkit doctor
Report per-env operator availability and warmup-model status.
vkit doctor # Single-env report (dev install or :slim image)
vkit doctor --expect core # Assert core-env expected operators are importable
vkit doctor --expect asr # Asserts for :asr image
vkit doctor --json # Machine-readable output on stdout
| Flag | Meaning |
|---|---|
--expect ENV |
Image env to validate against (core, asr, diarize, tts, fish-speech). Exits non-zero if any expected operator fails to import. Used by the Dockerfile's per-stage smoke test. |
--json |
Emit a JSON report on stdout (rich table still goes to stderr). |
Inside the voxkitchen:latest multi-env image, vkit doctor with no --expect aggregates a table across every env under /opt/voxkitchen/envs/, re-invoking each env's own vkit doctor --expect <env>.
vkit docker
Run any of the commands above inside a published Docker image instead of the local Python env. Also has three image-management helpers (build, pull, shell).
Image selection flags (run, download, doctor, pull, shell):
| Flag | Default | Meaning |
|---|---|---|
--tag NAME |
latest (download: slim) |
Image tag. Resolves to ghcr.io/xqfeng-josie/voxkitchen:NAME. |
--image REF |
— | Full image reference; overrides --tag. |
vkit docker run <yaml>
Execute a pipeline inside the container.
vkit docker run pipeline.yaml # :latest
vkit docker run pipeline.yaml --tag asr # :asr
vkit docker run pipeline.yaml --gpus none # CPU-only
vkit docker run pipeline.yaml --dry-run # Validate inside image
vkit docker run pipeline.yaml --env-file /tmp/.env # Alternate env file
vkit docker run pipeline.yaml --mount /data/raw # Extra read-only bind mount
| Flag | Default | Meaning |
|---|---|---|
--gpus MODE |
auto |
auto (attach all GPUs if nvidia-smi is on PATH), all, or none. |
--env-file PATH |
./.env if present |
docker --env-file path (used for HF_TOKEN). |
--mount PATH, -m |
— | Extra host path to bind read-only. Repeatable. |
--dry-run, --resume-from, --stop-at, --num-gpus, --num-workers, --work-dir, --keep-intermediates |
— | Pipeline options forwarded to the image entrypoint. |
The wrapper automatically:
- Sets
--user $(id -u):$(id -g)and-e HOME=/tmpso files in./workare owned by the host user. - Sets
NUMBA_CACHE_DIR=/app/work/.numba-cacheso librosa/numba operators can cache under the mounted work directory. - Binds
./work → /app/workand./output → /app/output; if./dataexists, binds it to both/app/datafor template-relative YAML and/datafor absolute data roots. - Binds the pipeline YAML at its absolute path when it points to a host file.
vkit docker doctor
Run vkit doctor inside the container.
vkit docker doctor # :latest, multi-env aggregate
vkit docker doctor --tag slim # slim image, single-env
vkit docker doctor --tag asr --expect asr --json # smoke test + JSON
Accepts --expect and --json (same semantics as local vkit doctor). Default --gpus is none (doctor doesn't need GPU).
vkit docker download <recipe>
Download a dataset inside the container. The wrapper creates and mounts
./data, so roots under ./data/... are written back to the host.
vkit docker download --tag slim librispeech --root ./data/librispeech --subsets dev-clean
vkit docker download --tag slim fleurs --root ./data/fleurs --subsets en_us,zh_cn
vkit docker build [target]
Build a local Docker image from docker/Dockerfile (wraps docker build).
vkit docker build # Default target: latest
vkit docker build slim
vkit docker build asr
vkit docker build latest --tag voxkitchen:dev
vkit docker build latest --no-hf-token # Skip baking pyannote
| Argument/Flag | Default | Meaning |
|---|---|---|
target |
latest |
Dockerfile target: slim, asr, diarize, tts, fish-speech, latest. |
--tag NAME |
voxkitchen:<target> |
Image tag to apply. |
--hf-token / --no-hf-token |
--hf-token |
Pass HF_TOKEN from ./.env as a build arg so pyannote is baked into the image. |
Pass extra docker build flags after --:
vkit docker build latest -- --no-cache --progress=plain
By default the wrapper keeps Docker client temp/config/cache files under
./.docker (DOCKER_CONFIG, TMPDIR, BUILDX_CONFIG, XDG_CACHE_HOME).
Set VKIT_DOCKER_WORK_DIR=/path/to/.docker to choose a different base
directory. Docker image layers still live under the Docker daemon's
data-root (often /var/lib/docker); move that daemon setting separately if
/ is full.
vkit docker pull
Pull a published image from GHCR.
vkit docker pull # :latest
vkit docker pull --tag slim
vkit docker pull --image my-registry/vox:custom
vkit docker shell
Drop into an interactive bash inside the image, useful for debugging.
vkit docker shell --tag slim
vkit docker shell --tag latest --gpus all
vkit viz
Launch an interactive Gradio panel to explore a CutSet.
vkit viz work/01_pack/cuts.jsonl.gz --port 7860
vkit viz is an optional local developer UI; it is separate from the
Docker-first pipeline execution path.
vkit card
Generate a standalone, shareable HTML dataset card from a processed CutSet manifest. The card includes quality distributions, language/gender breakdowns, a metrics summary, and sample utterances — useful for dataset documentation and sharing.
vkit card work/01_pack/cuts.jsonl.gz
vkit card work/01_pack/cuts.jsonl.gz --out my_dataset_card.html
vkit card work/01_pack/cuts.jsonl.gz --out card.html --title "My Dataset" --description "ASR training set"
# Auto-fill from the dataset catalog (license/homepage/recommendation):
vkit card work/01_pack/cuts.jsonl.gz --catalog-id librispeech
| Flag | Default | Meaning |
|---|---|---|
--out, -o |
dataset_card.html |
Output HTML file path. |
--title |
"" |
Card title (shown at the top of the HTML). |
--description |
"" |
Short dataset description. |
--catalog-id |
(none) | Pre-fill title/description and a Source section (license, homepage, paper, recommendation) from the matching entry in voxkitchen/datasets/catalog.yaml. Explicit --title/--description still override. |
Requires the viz extra. If Jinja2 is not installed, the command exits with a
friendly message:
error: the dataset card needs the 'viz' extra. Install it with `pip install voxkitchen[viz]`.