Skip to content

FLEURS

Few-shot Learning Evaluation of Universal Representations of Speech — standardised ASR/LID evaluation set covering 102 languages derived from the FLoRes-200 text corpus.

Recommendation

The standard multilingual ASR evaluation benchmark. Use it to measure cross-lingual ASR quality consistently across languages rather than as a training corpus (each language only has ~10 h). Available via HuggingFace datasets — VoxKitchen's recipe handles the streaming download.

Getting the data

Downloadable via VoxKitchen (fleurs, source: HuggingFace, size: —):

vkit docker download --tag slim fleurs --root ./data/fleurs

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at examples/pipelines/fleurs-multilingual.yaml — run it with vkit docker run.