I remember something from a stats course many years ago which might be helpful now. I want to distinguish between patients who show symptoms all over the board, vs. patients who have similarly high sum-scores, but due to a few very high symptoms. Symptoms have a scale from 0 to 3.
I remember the following procedure (imagine we have 3 symptoms per patient):
- Calculate the percentage each symptom in relation to the sum-score of that person's symptoms, so s1/sum, s2/sum, s3/sum. Let's call these 3 values p-values.
- Build natural log of these 3 values. Let's call these p(ln)-values. (If original value is 0, result of this should be 0, or change scale that there are no 0s anymore)
- Calculate: -2*p*p(ln) per symptom.
- Sum these up over all symptoms.
Using this procedure, I get the same values if all symptoms are equally high (no matter if they are all 1 or all 3), which is what I want. However, the variance in the value between participants with very equal answering patterns, and only few very high symptoms, are quite minimal, which could be due to the small symptom scale 0-3 (the differences get larger using higher values).
I am not sure whether I misremember the name "entropy", or the formula, and would appreciate help. Could I "inflate" the differences, e.g. using 1 10 100 1000 instead of 0 1 2 3 as symptom values?