Skip to content

MagicData-RAMC (Rich Annotated Mandarin Conversational)

180 hours of Mandarin two-party conversational telephone-style speech from 663 speakers across Chinese accent regions, with speaker-turn and topic annotations spanning daily-life to technology topics.

Recommendation

Strong fit for conversational Mandarin ASR, speaker diarization, and turn-taking research where read-speech corpora fall short. Choose when you need spontaneous dialogue with speaker-attributed transcripts.

Getting the data

Obtain from the dataset homepage.

OpenSLR distribution is CC BY-NC-ND 4.0 — research-only, non-commercial, no derivatives. Verify before integrating into derivative datasets.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at voxkitchen/templates/pipelines/speaker-analysis.yaml — run it with vkit docker run.