I have a a set of films, and for each films, a set of reviews - varying between 1 review and several hundred reviews for each film. Each review has a star rating from 1 to 5.
I am using Wilson's confidence interval for a Bernoulli parameter to estimate whether the film is likely to be good or not, taking into account the number of ratings (I just count any 3+-star reviews as positive, anything else as negative).
However, I'd also like to figure out how likely to be divisive a film is, given the number of ratings.
So a film with 200 reviews - 100 1-star reviews and 100 5-star reviews - is more likely to be divisive than a film with 2 reviews = 1 1-star review and 1 5-star review. However, both films clearly have the same standard deviation of ratings.
I don't think I can use the same Wilson's confidence interval calculation that I'm using for 'goodness', since the ratings aren't Bernoulli in nature (EDIT: I'm assuming they are normal).
Does anyone have any ideas on how to measure 'divisiveness' in this way?