5

Warning: I am not a statistician, so please go easy!

I have a (not) normally distributed measure of a population, with possible values from 0 to 100%. Its mean is 60%. I need to be able to shift the distribution to give it a mean of 70%.

I obviously can't just add 10% to all scores because anyone who previously scored 100% would now score an impossible 110%.

So is there a formula I can use to shift the population data so the mean is increased, while maintaining integrity of the limits of 0% and 100%?


Edited because obviously this non-statistician elicited eye-rolls from the more knowledgeable people in the room.

So it's apparently not a normal distribution but it has a bell-curve shape. The rest of what I'm trying to achieve holds true so if anyone is able to offer any useful suggestions, I'd still be interested to hear.


Editing to add further information. I have to be somewhat vague due to IP, but I'll try to explain best I can.

We have two measures that are used in a larger calculation. We know that one is inherently more trustworthy than the other, by a factor of about 10%, due to the way the data is collected. That cannot be resolved. However, when being used in the calculation, we need them to be calibrated to the same scale. The thought was that we could shift the population of one in order to match the mean of the other, so calibrate them and give us more meaningful results. Does that help?

Richard Hardy
  • 67,272
  • 4
    Since the normal distribution is defined on $(-\infty,\infty)$ then your variable cannot be normally distributed. In any case, if your distribution is bounded, and you want to change the mean then the distribution is likely to not be normal anymore since it will be skewed. So maybe your question should be "how do I transform my arbitrary distribution into another arbitrary distribtuion?" – user2974951 Oct 17 '22 at 10:46
  • Still, there is a normal distribution between [0, 1] and it is the standard normal distribution (or Z-distribution). – rusiano Oct 17 '22 at 10:53
  • 2
    @user2974951 is bang on here. Positive advice about what to do probably depends on (1) knowing why you want to do this (2) agreeing that your goal makes sense statistically (3) knowing more about your data. As you're aware your goals are on the face of it contradictory. – Nick Cox Oct 17 '22 at 10:53
  • 2
    @rusiano: no, that is wrong. The standard normal is just as unbounded as any other normal. – Stephan Kolassa Oct 17 '22 at 10:53
  • 1
    @rusiano Not so. The standard normal is not bounded any more (or less) than any other normal. – Nick Cox Oct 17 '22 at 10:54
  • Do you want to modify the distribution your data presumably are sampled from ("bell curve" sounds like a beta, where you could shift the mean but keep the overall shape), or do you want to transform every single observation? Can you edit your post to include your data? – Stephan Kolassa Oct 17 '22 at 10:54
  • 2
    Rather than change the measure, why not change the 'larger calculation' in a way that accounts for the difference between the measures? – mkt Oct 17 '22 at 11:22

1 Answers1

5

Suppose your original observations are $x_1, \dots, x_N$, all between $0$ and $1$. One simple transformation would be to take an observation $x_i$ and shift it closer to $1$ by some factor $\alpha$, i.e., to reduce its distance from $1$:

$$ x_i \mapsto x_i':=1-\alpha(1-x_i). $$

If your original mean is

$$ \bar{x} = \frac{1}{N}\sum_{i=1}^N x_i, $$

then your new mean is

$$ \bar{x}' = \frac{1}{N}\sum_{i=1}^N x_i' = \frac{1}{N}\sum_{i=1}^N \big(1-\alpha(1-x_i)\big) = 1-\alpha + \alpha \bar{x}. $$

Solve this for $\alpha$,

$$ \alpha=\frac{1-\bar{x}'}{1-\bar{x}}, $$

and transform your data. Here is a little illustration in R:

xx <- c(1,rep(0.1,8),rep(0.2,25),rep(0.3,60),rep(0.4,90),rep(0.5,100),
  rep(0.6,90),rep(0.7,60),rep(0.8,25),rep(0.9,8),1)
mean(xx)
# [1] 0.5021368

desired_mean <- 0.7 alpha <- (1-desired_mean)/(1-mean(xx)) xx_prime <- 1-alpha*(1-xx) mean(xx_prime)

0.7


This approach will keep the order relationship of your original data, but it will change the variance. If you want to also keep the variance but change the mean (both only approximately), you could proceed as here: fit a beta distribution with parameters $a$ and $b$ to your data, then determine parameters $a'$ and $b'$ of a new beta distribution with the desired mean and variance, then transform any observation $x$ to

$$ x\mapsto F_{a',b'}^{-1}\big(F_{a,b}(x)(x)\big), $$

where $F_{a,b}$ is the cumulative distribution function (CDF) of the beta distribution. This can be done easily in R or Python. However, note that this changes the underlying fitted distribution, and the mean and variance of the transformed data will likely not precisely what you desired, but a little off.

Stephan Kolassa
  • 123,354
  • Thanks @Stephan, this is a big help. However, I can't get it to work. Hopefully you can point out what I'm doing wrong?

    I've put together some dummy data, format "value (count)". 0 (1), 0.1 (8), 0.2 (25), 0.3 (60), 0.4 (90), 0.5 (100), 0.6 (90), 0.7 (60), 0.8 (25), 0.9 (8), 1 (1)

    Mean is 0.5. If I want to transform so the mean is 0.6, using the formula above I would have = (¯′N - 1)/(¯N - 1) = 1.2 Then each value ′:=1−(1−). New mean, however, is then 0.4 not 0.6, and I end up with negative values for smaller values of .

    Any help, please?

    – user176504 Oct 17 '22 at 13:50
  • I'll take a look as soon as I find the time. – Stephan Kolassa Oct 17 '22 at 15:59
  • 1
    I edited my answer. I'm not surprised you didn't get it to work, because I had an error in my calculation. I'm sorry for the confusion. It should now work, or at least seems so given your data. – Stephan Kolassa Oct 17 '22 at 16:29