2

Suppose I have a data sample and I want to test whether normal distribution with mean $0$ and variance $1$ fits this data sample. If I understand the chi-squared test correctly, I think that I should use the chi-squared statistic with $r-1$ degrees of freedom (where $r$ is the number of bins I divided the data into) for this test.
Now suppose that after getting this data, I somehow calculated minimum chi-square estimates of mean and variance using this sample and they turned out to be $0$ and $1$. Do I understand correctly that in this case, if I want to test whether normal distribution with mean $0$ and variance $1$ fits this data sample, I need to use chi-squared statistic with $r-3$ degrees of freedom (because of the fact that I estimated the parameters of the distribution) ? If in this case, it is the correct way to use the test, here comes my main question. How does the fact that before coming up with my hypothesis for the distribution of the sample, I′ve estimated the parameters, change the way the test should be applied ? It is completely counterinuitive to me that even though in both cases I am testing the same hypothesis with the same data sample, I must use different statistics (with $r-1$ degrees of freedom in the first case and with $r-3$ in the second case).

  • 2
    It's more subtle and complicated than this. See https://stats.stackexchange.com/a/17148/919 for some discussion and examples of what can go wrong. For an intuitive, direct answer to your ultimate question, consider what happens to the chi-squared statistic when you estimate a distribution with $r-1$ parameters to fit all $r$ bins exactly. – whuber May 13 '22 at 18:41
  • @whuber It seems that we have a problem with calculating the number of degrees of freedom in the case of $r-1$ parameters, as we should have $r-(r-1)-1=0$ of them. Correct me if I'm wrong, but I suppose it just shows that mathematical theorem which justifies the chi-squared test is only valid when $r-s-1>0$. So what should actually happen in this case ? – Юрій Ярош May 15 '22 at 21:59
  • @whuber It seems to me that examples from your post show what can go wrong if we don't use minimum chi-square estimates, so I don't see the connection between your examples and the question I have. – Юрій Ярош May 22 '22 at 07:45
  • I list many more requirements than that! – whuber May 22 '22 at 11:57
  • @whuber By "examples" I meant samples you've generated and histograms you've built, that's why I said that they only show what can go wrong when you use wrong estimates. Yes, you list many more requirements, but I don't see how they can hint the answer to my question. – Юрій Ярош May 23 '22 at 09:00
  • @whuber So what actually happens to the chi-squared statistic when you estimate a distrbution with $r-1$ parameters to fill all $r$ bins exactly ? – Юрій Ярош May 31 '22 at 11:36
  • 1
    When the parameters designate a sufficiently flexible family, that means you can fit the counts exactly, as you write. The chi-squared statistic then automatically becomes zero no matter what. This can be considered a chi-squared distribution with "zero degrees of freedom"--but evidently the entire exercise tells you almost nothing about the data. – whuber May 31 '22 at 11:45
  • @whuber But I don't see how does it relate to my main question. As in the example from the post I don't have the situation with $0$ degrees of freedom. – Юрій Ярош May 31 '22 at 13:07
  • The point was that by considering this case you should be able to derive insight about all other cases, that's all. – whuber May 31 '22 at 13:08

0 Answers0