Skip to content

AISHELL-4

Real-recorded Mandarin conference-meeting corpus (8-channel circular mic array), 211 sessions with 4-8 speakers each, annotated for transcription and speaker activity.

Recommendation

Choose for Mandarin meeting/conference scenarios needing realistic far-field, multi-speaker, overlapping speech — supports ASR, diarization, separation, and enhancement. 120 h is modest and array-based far-field, so it suits meeting-domain work more than general ASR.

Getting the data

Obtain from the dataset homepage.

211 sessions, 8-channel circular mic array; distributed via OpenSLR

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/speaker-analysis.yaml — run it with vkit docker run.