0

There was a question about p value interpretation and one of the comments made me wonder whether there is some basic misunderstanding I have about p values.

I wrote "you can discard the null hypothesis because your data was very unlikely produced by the null hypothesis". This is imprecise because the null hypothesis doesn't produce anything, of course. I should have written something like "... the data is very unlikely given that the null hypothesis is true, hence you can discard the null hypothesis". Anyway, the example provided in the comment makes me think that me simply being imprecise is not the point but there is some completely conceptual different meaning in what I wrote and what I meant. The comment included this example:

"Imagine the following: you have a machine that measures neutrino mass on earth to check whether the sun is going to explode each second. Its calculation however includes some floating points, so once in a Million there is a mistake and the machine says the sun is gonna explode. Today it's your shift, and the machine tells you that the sun is gonna explode, p=1e^-6. Now, the "data was very unlikely produced by the null hypothesis" is not correct, because it is distinctly less likely that the data is not produced by the null, considering we're still alive."

I think the point is that the probability that the sun exploded given the fact that we are alive is zero, hence it would actually be wrong to believe the sun exploded just because the p value is so small. But how is this related to the distinction between "...given null is true" vs "...produced by null"? I don't see what I am missing but I feel like there is some important fundamental difference.

  • A better rephrasing of your initial statement would be to replace "produced by the null hypothesis" with "produced by random sampling from the assumed model, under the null hypothesis". That could indeed produce data and has the required properties. I would see your initial statement as merely a shorthand statement implying the longer one. If I understand the comment correctly it seems to relate to a slightly different issue. – Glen_b Apr 21 '19 at 00:00
  • What different issue is it? I am really interested to know because I can't figure it out by myself. Any hints are appreciated. –  Apr 21 '19 at 03:52
  • 1
    As I read it, it relates to a Bayesian criticism of NHST (a partly misdirected one in my mind). The criticism arises because the usual significance tests don't compare the probability of a result at least as extreme as the observed one under the null and under the alternative; it just computes it under the null. Such examples are constructed by building a situation where that probability is lower under the alternative than under the null (so you reject in favor of a less likely alternative), but in that sort of situation many frequentists would not blithely choose to do the test as it stands. – Glen_b Apr 21 '19 at 06:37
  • ctd... While something akin to it can happen in a few practical situations, real cases are quite rare (and often arise from a somewhat artificial version of the question of interest). In more typical situations, the common constructions guarantee the opposite. If I have correctly apprehended the point there, it's worth being aware of the issue. – Glen_b Apr 21 '19 at 06:37
  • Thank you a lot for your answer. I miss basics in baysian thinking and that is what I am going to read about next. That was helpful. –  Apr 21 '19 at 07:43
  • It is a gross abuse of NHST to apply it to monitoring in the way described by the example, anyway. – whuber Apr 23 '19 at 03:35

0 Answers0