IEMOCAP

~12 h of acted audio-visual dyadic interactions from 10 actors (scripted and improvised), with categorical and dimensional (valence/activation/ dominance) emotion labels.

Task: emotion, speaker
Languages: en
Hours: 12
Domain: scripted dyadic
License: see source terms
Homepage: https://sail.usc.edu/iemocap/
Paper: https://sail.usc.edu/iemocap/Busso_2008_iemocap.pdf

Recommendation

A foundational benchmark for conversational/dyadic emotion recognition and multimodal affect modeling — choose it when you need dialog context and both categorical and dimensional labels. Emotion is acted, access is gated behind a signed license, and class distributions are imbalanced.

Getting the data

Obtain from the dataset homepage.

Requires a signed academic release form from USC SAIL; not freely downloadable.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at examples/pipelines/emotion-recognize.yaml — run it with vkit docker run.