Can we use Pearson chi-squared test when we have only estimated values of parameters?

Question

In our mathematical statistics course we've learnt about Pearson chi-squared test. Our lecturer told us that when we want to test a hypothesis about the distribution (parameters of which can only be estimated and are unknown) which was used to create the random sample we have, we should use the chi-squared distribution with $r-s-1$ degrees of freedom, where $r$ is the number of sets of values we partitioned the image of the distribution into, and $s$ is number of parameters of distribution which we only have estimates for. I do understand that the theorem which justifies the fact that test-statistic has a chi-squared distribution states how many degrees of freedom the distribution will have under the null hypothesis. But why can't we test the hypothesis that we have this distribution with known parameters which we will take to be equal to the estimates? Why can't we choose the hypothesis this way?

You can, when you know the parameters. The full set of conditions required is laid out in the middle of my post at https://stats.stackexchange.com/a/17148/919. — whuber, Apr 20 '22 at 18:14
@whuber Do you mean when I know the real values of the parameters ? — Юрій Ярош, Apr 20 '22 at 18:48
Yes. This is used all the time for distribution testing. Suppose, for example, you have a sample of size $50$ from a distribution you believe to be uniform on the interval $[0,1].$ You could (before examining the data) elect to partition the interval into, say, $10$ bins $[0,.1],$ $(.1,.2],$ through $(.9, 1].$ The expected count of each bin is $50/10=5.$ The $\chi^2$ statistic has $9$ df (because of the single constraint that the counts sum to $50$). I tested the solutions at https://stats.stackexchange.com/a/117711/919 in this fashion. — whuber, Apr 20 '22 at 19:38
@whuber Looking at your example my question boils down to, is it okay to use the $\chi^2$ statistic with 9 df (given that there are 10 bins) if I got values of parameters using some estimators and I don't know for sure that these are the real values of parameters ? — Юрій Ярош, Apr 20 '22 at 22:21
You seem to be shifting back and forth between asking about known and asking about estimated parameters. As explained in the first link I gave, 9 df is unlikely to be correct in the case you just described. You have to estimate the parameters based on the bin counts using Maximum Likelihood and you have to adjust the DF by the number of parameters you estimate. Other conditions hold, too; for instance, the $\chi^2$ distribution will be completely wrong if any of your parameter estimates are on the boundary of parameter space. — whuber, Apr 20 '22 at 22:45
@whuber But my question is not whether the $\chi^2$ will be correct, but whether what I described in the previous comment is a valid way of applying the test ? — Юрій Ярош, Apr 21 '22 at 12:10
I am unable to distinguish the two forms of the question: if the p-value is incorrect, the test is invalid; and if the test is invalid, that means that at least in some cases the p-value will be incorrect. — whuber, Apr 21 '22 at 12:12
@whuber You say that "if the p-value is incorrect, the test is invalid". But why is the p-value incorrect in your example from my question ? — Юрій Ярош, Apr 24 '22 at 22:31
Because when you use the incorrect df you are using an incorrect version of the chi-squared distribution and it will not return the correct p-value. — whuber, Apr 25 '22 at 00:12
@whuber Maybe I understand the theorem which justifies the test incorrectly. It basically says that if the sample really was generated using some specific distribution then the statistic will converge in law to Pearson distribution with $r-1$ degrees of freedom where $r$ is number of cells. And also there is a version of the theorem for the case where some parameters are unknown. But why can't we use the version of the theorem(with specific distribution) I described to justify that the p-values will be correct ? — Юрій Ярош, Apr 25 '22 at 07:17
Because of the conflicting information in your post and comments, I cannot tell what you have described. — whuber, Apr 25 '22 at 12:43
@whuber Maybe you could explain what exactly is conflicting, so that I could elaborate ? — Юрій Ярош, Apr 25 '22 at 13:42
I already have done that. I don't want to keep repeating myself, so I would just refer you to the comments that are here. — whuber, Apr 25 '22 at 13:47

Can we use Pearson chi-squared test when we have only estimated values of parameters?

0 Answers0