Skip to content

IEMOCAP

~12 h of acted audio-visual dyadic interactions from 10 actors (scripted and improvised), with categorical and dimensional (valence/activation/ dominance) emotion labels.

Recommendation

A foundational benchmark for conversational/dyadic emotion recognition and multimodal affect modeling — choose it when you need dialog context and both categorical and dimensional labels. Emotion is acted, access is gated behind a signed license, and class distributions are imbalanced.

Getting the data

Obtain from the dataset homepage.

Requires a signed academic release form from USC SAIL; not freely downloadable.

Suggested processing

A recommended VoxKitchen pipeline ships in the repository at examples/pipelines/emotion-recognize.yaml — run it with vkit docker run.