0

I see an example for standardizing data value here. For better illustration, the table is shown below

enter image description here

I did the same thing in excel and as you can see, the results are different.

enter image description here

For mean and variance, I used AVERAGE and VAR formulas, respectively in Excel. The standardized values are calculated as:

0  ->  (0-2.5)/3.5 = -0.714
...

What is wrong then?

nwaldo
  • 531
mahmood
  • 213

1 Answers1

1

The value $3.5$ is calculated using the sample variance formula.

You need to use the population standard deviation formula: \begin{equation} \sqrt{\frac{\sum_{i=1}^{n}(x_i-\bar{x})^2}{n}} \end{equation}

The population standard deviation is $\sqrt{2.916667}$. Standardizing using this value will yield the desired results. \begin{equation} \frac{0-2.5}{\sqrt{2.916667}} \approx -1.4638501 \end{equation}

Using R we can replicate the results from your example:

x = c(0,1,2,3,4,5)
s1 = sum((x-mean(x))^2)/6
xs = (x-mean(x))/sqrt(s1)
data.frame(x, xs= round(xs,2))

Output:

 x    xs
 0 -1.46
 1 -0.88
 2 -0.29
 3  0.29
 4  0.88
 5  1.46
nwaldo
  • 531
  • In Excel, I see STDEV, STDEV.P, STDEV.S and some other forms. Doe sit matter which one to use? – mahmood Nov 07 '20 at 14:02
  • 1
    Yes it does, the sample and population standard deviation use different denominator $n-1$ vs $n$, respectively. STDEVP is the function you want to use - this will use $n$ – nwaldo Nov 07 '20 at 14:45