2

As a graduate student, I have always used tools which calculate p-value for me and I kind of understand what it means. If p-value is 0.05, there is only 5% of chance that something happens naturally.

To me, probability only makes sense when there is whole population or something that goes to "denominator". For example, the probability of getting 2 from a 6 dice is 1/"6" since there are "6" options. Here, p-value is also 1/6. I got this.

However, I've seen so many p-values in every analysis without population. For example, when you want to know correlation between English score and math score in your class, a tool calculates p-value here as well with correlation value (r value). What does it mean? What I thought was that to calculate every r value of every class in the world and see how the r value of my class is high and determine the p-value based on it. But the tool certainly does not know English score and math score of all students in the world.

The problem is not limited to correlation. When "denominator" is unknown, a tool always calculate p-value with given sample. How is this possible? How can I understand p-value?

  • 1
    P-values make sense whenever you are considering probability. So this also works when you observe an entire population (there is still probability involved in how this population got a certain statistic). I remember a good previous question about this. But I cannot find it with a quick search. – Sextus Empiricus Mar 09 '23 at 08:28
  • 4
    "If p-value is 0.05, there is only 5% of chance that something happens naturally." This interpretation of p-values is not correct, but your question is about the third paragraph right? For understanding p-values we have several other questions with answers. – Sextus Empiricus Mar 09 '23 at 08:33
  • 1
    The example in your third paragraph is unclear. What are you doing exactly when you are computing the correlation between English score and math score and compute a p-value? You suddenly speak about r values of every class in the world, but that is often not the comparison that is being made when a p-value is calculated (the p-value expresses the probability to obtain your observed deviation or a stronger deviation, relative to some hypothesised value, e.g. zero, when that value would be true) – Sextus Empiricus Mar 09 '23 at 08:36
  • @SextusEmpiricus When I said "r values of every class in the world", I meant for example, if correlation value of my class is 0.9 but it turns out that there is no correlation (like there are two kinds of people: English person and Math person, no one really does it well for both), then the p-value would be small like <0.05. However, if r value of my class is 0.9 and there is indeed very strong correlation (like students who study English hard would also study math hard, thus get good scores for both), then the p-value would be large, because high r value here isn't significant anymore. Right? – Andy Junghyun Kim Mar 09 '23 at 12:16

2 Answers2

1

The way p-values are typically used may not make sense to you because the way p-values are typically used does not make sense. At least, that is what some statisticians would say, and I am at least sympathetic to this critique.

First, let's assume a case that is purely descriptive. Simplifying your example a bit, say we have math test scores for a certain set of students and we want to know if female scores are higher than male scores for these students. Well, we can just look at the averages, and there's your answer. No need for a p-value. In fact, in this context, a p-value doesn't really have any meaning.

Now assume that this set of students is a random sample from a larger population, and we want to know if the female math scores are higher than the male math scores for this entire population--not just the subset we sampled. Then a p-value makes sense because it quantifies the sampling uncertainty around our estimate. Just because the female scores are higher in our sample, doesn't mean they're higher in the population.

Instead, imagine that our set of students isn't a random sample, but a convenience sample--and I think this is the sort of cases you're seeing. Then the above logic doesn't follow. Instead, when people report a p-value for this sort of situation, they are implicitly appealing to the concept of a super-population. Basically, they're asking you to imagine that the set of students is drawn from an imaginary (possibly infinite) population of students similar to the actual students in your sample. For some people, this thought experiment is quite convincing, and they can probably describe it better than I. In any case, the assumption here is that you really want to know whether female scores in this super-population are higher than male scores. The p-value attempts to quantity your uncertainty in inferring from the set of students you observe to this super-population.

To quote the late David Freedman,

Samples of convenience are often analyzed as if they were simple ran- dom samples from some large, poorly-defined parent population. This un- supported assumption is sometimes called the “super-population model.” The frequency with which the assumption has been made in the past does not provide any justification for making it again, and neither does the grandiloquent name.

He goes on to say,

An SE for a convenience sample is best viewed as a de minimis error estimate: if this were—contrary to fact—a simple random sample, the uncertainty due to randomness would be something like the SE.

See a link here. For a more lengthy discussion from Berk and Freedman, see here. A representative quote form this article, goes like this:

As we shall explain below, researchers may find themselves assuming that their sample is a random sample from an imaginary population. Such a population has no empirical existence, but is defined in an essentially circular way—as that population from which the sample may be assumed to be randomly drawn. At the risk of the obvious, inferences to imaginary populations are also imaginary.

Again, while Freedmen (and I) don't necessarily find the notion of a super-population convincing, many do, and it's widely used.

Finally, p-values can also make sense in the context of causal inference. For the simple case of a randomized experiment, you typically want to know the average treatment effect on the participants. However, to know this, you'd need to see each participant treated and not treated. Instead, you only see say half treated and the other half untreated. You take this difference as your estimate of the average treatment effect but you still have uncertainty here because if slightly different set of participants had ended up in the treatment group (instead of the control group), you estimate would be a little different. The uncertainty here comes from random assignment instead of random sampling but works out to be about the same thing and the p-value helps quantify this uncertainty.

Causal inference for quasi-experimental (observational) studies can also be conducted with the attempt to estimate what would have happened if you had had a randomized experiment. So here the p-value can quantify uncertainty about this inference.

num_39
  • 1,454
0

For the Pearson correlation, null hypothesis is correlation coefficient being zero.

The definition of p-value is the probability of getting a calculated sample statistic if your null hypothesis is true. In other words, it shows how well your null hypothesis explains the given sample whether your sample aligns with the null hypothesis by a respective criterion. It's not any kind of a quantitative measure for the whole population.

dx2-66
  • 361
  • 4
    "getting" needs a gloss here, as usually referring to that statistic or one more extreme. – Nick Cox Mar 09 '23 at 10:08
  • Ok but every teachers / professors explained p-value as a matter of quantitative measure, for example, search statquest p value in Youtube. I know theoretically p value shows how well your null hypothesis explains the given sample. But sorry... how is it possible? Please refer to the comment above that I wrote to @SextusEmpiricus. – Andy Junghyun Kim Mar 09 '23 at 12:37
  • 2
    I'm afraid that a "p-value shows how well your null hypothesis explains a given sample" is woefully wrong. Please consider spending a little time reviewing some of our threads on this fundamental concept, such as https://stats.stackexchange.com/questions/31. – whuber Mar 09 '23 at 22:53
  • @whuber Kindly elaborate. "A hypothesis poorly explains the observed result" and "conditional probability of the observed (or more extreme) result under this hypothesis is very low" sound synonymic to me. – dx2-66 Mar 10 '23 at 07:01
  • 1
    That's part of the problem: "poorly explains" could mean many things, but is definitely not synonymous with a low p-value. As an example of the distinction, consider the (many hundreds) of questions we have fielded concerning why a test of distribution, such as the KS test, when applied to obviously Normally distributed data nevertheless will reliably yield a low p-value with sufficiently large data sets. The problem is that p-values help us detect deviations from hypothesized behavior while giving no information about the magnitudes or meanings of those deviations. – whuber Mar 10 '23 at 17:16
  • 1
    @whuber That's a good point, different tests may yield very different p-values while operating on the same hypothesis. That seems to break down to the testing algorithm though, perhaps we could say 'how well the sample aligns with the null hypothesis by a certain criterion'? Either way, your clarification seems to further stress the point of not mindlessly treating p-values as quantitative measures for the general population - and, as you added, of the null hypothesis by itself. – dx2-66 Mar 13 '23 at 07:54
  • That's a good summary. I only wish to add that "aligns with the null hypothesis" is frequently understood in the overly narrow sense of the formal statement of $H_0,$ whereas according to the rules of logic that phrase really means the test assesses whether any of the assumed conditions are plausible. Those conditions include all distributional assumptions, independence assumptions, assumptions about the possible amounts of measurement error, and so on, as well as the distributional constraints embodied in $H_0$ itself. – whuber Mar 13 '23 at 15:09