I'm currently working on a speech emotion recognition task, and I was curious to know whether my dataset is imbalanced or not.
The value counts for each class are roughly uniform except for one class whose number of occurrences is about a third of the mode. But when I get the duration for each sound file and compute the mean duration for each class, I find that the mean durations are in fact equal. So my question is, do longer sound files provide the model with more information, in which case, no class imbalance exists, or is it perceived by the model to be the same amount of information as any other instance, in which case, class imbalance does exist?