0

First, I will preface this question with my ulterior motive: I would like more evidence that the use of 19th and 20th century approximations play little to no pedagogic advantage in modern intro stats or intro data science courses.

First, let us agree to work with the following definition of a P-value: The probability of observing your sample—or something more extreme—given that the null hypothesis is true.

We wish to conduct a two-tailed hypothesis test for a population proportion using counts and exact probabilities from the binomial distribution. The hypotheses are $$H_0 : p = 10\%$$ $$H_a : p \ne 10\%$$ The sample obtained has $n=189$ and there are $k=10$ successful observations in this sample.

¿What is the two-tailed P-value for this test? It seems that there are reasonable arguments for either $$P(X \le 10) + P(X \ge 27) = 0.053$$ or $$P(X \le 10) + P(X \ge 28) = 0.038$$ (For those of you "addicted" to the conventional significance level of $\alpha=0.05$, you can probably see where I might be going with this. ;-)

To keep this in a pedagogic framework, I'm most curious for answers that might indicate how you would grade a student's work who submitted either answer...and how you would justify any loss of points that might occur.

Gregg H
  • 5,474
  • 1
    Take the 1 sided test p-value ($P(X\le 10)$) and double it. – AdamO Jul 11 '19 at 21:38
  • 1
    I would have thought $P(X \le 10) + P(X \ge 29) = 0.0285$ has some justification as the probability of "as extreme as or more extreme than $10$" – Henry Jul 11 '19 at 22:19
  • @AdamO ¿can you provide a rationale for why this would be considered exact? – Gregg H Jul 12 '19 at 00:38
  • @Henry ¿what is the justification for 29 and above? 10 to the mean is 18.9, 8.9 above 18.9 is 27.8...not sure what the argument for 29 would be – Gregg H Jul 12 '19 at 00:43
  • 1
    $P(X \le 10) \approx 0.0150$ while $P(X \ge 28)\approx 0.0229$ so you could say that, given the hypothesis $n=189, p=0.1$, then $28$ is a less extreme observation than $10$. Meanwhile $P(X \ge 29)\approx 0.0135$ so $29$ is a more extreme observation than $10$ – Henry Jul 12 '19 at 07:02
  • 3
    This test hasn't been fully defined yet: you still have to specify a critical region. Ignoring randomized tests, there are at least 5 possibilities, depending on whether you want it to be symmetric in values, symmetric in probabilities, come as close as possible to the nominal p-value, or something else. Thus, this isn't a good question to ask learners and has little if any bearing on the ultimate questions of pedagogy that motivate you. Indeed, I don't see how this question is related even remotely to "19th and 20th century approximations:" could you explain the connection? – whuber Jul 12 '19 at 16:02
  • Slightly off-topic, but just a heads-up that an inverted question mark is not used in the English language. – Frans Rodenburg Jul 13 '19 at 07:39
  • @GreggH: "Exact" in that the probability of rejecting the null does not exceed the nominal significance level when the null is true; contrasted with approximate tests derived from the asymptotic behaviour of statistics. The double-the-lesser-one-tailed-p-value method is preferred to other exact methods by some for (1) its simplicity (conceptual & computational), & (2) its intuitively satisfying response to changes in the null hypothesis or data. – Scortchi - Reinstate Monica Jul 15 '19 at 12:02
  • I would ask the moderators to reopen this question...as it has received over 1k views...it obviously is useful to enough people...even if it is "opinion-based" – Gregg H Mar 30 '23 at 14:59

1 Answers1

3

The confusion comes from how to define the extreme in "he probability of observing your sample—or something more extreme—given that the null hypothesis is true." If you give the clear definition, it would be easy to grade the answers. If the definition is not clear, the different correct answer will be possible derived from the different understanding of extreme.

  1. If define x as extreme based on Pr(X=x) <= Pr(X=10), Henry's answer will be generated.

  2. Define extreme based on the distance between x and mean >= the distance between observed 10 and mean (mean = 18.9 under null hypothesis), then x > 18.9 + 8.9 = 27.8, so p = 0.038 is correct.

  3. Maybe there is another reason to classify X>= 27 as extreme, then p = 0.053 is correct.

user158565
  • 7,461
  • (+1) NB Henry suggests adding the probability from the upper tail that doesn't exceed $P(X \leq x)$; not the probabilities that don't exceed $P(X = x)$. These are both common methods, & agree in this case but not always. – Scortchi - Reinstate Monica Jul 15 '19 at 12:10