Skip to content

SPGISpeech

5,000 h of professionally transcribed English company earnings-call audio, fully formatted with punctuation and capitalization.

Recommendation

A strong pick for English ASR on spontaneous, accented, real-world business speech with high-quality fully-formatted (punctuated, denormalized) transcripts. Access is gated behind a Kensho research agreement, so it is not freely redistributable.

Getting the data

Obtain from the dataset homepage.

Gated: requires signing the Kensho research download agreement. A newer SPGISpeech 2.0 (~3,780 h, speaker-tagged) also exists.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/asr-training-data.yaml — run it with vkit docker run.