17

I am looking for various ways of explaining to my students (in an elementary statistics course) what is a two tailed test, and how its P value is calculated.

How do you explain to your students the two- vs one- tailed test?

whuber
  • 322,774
Tal Galili
  • 21,541

2 Answers2

17

This is a great question and I'm looking forward to everyones version of explaining the p-value and the two-tailed v.s. one-tailed test. I've been teaching fellow orthopaedic surgeons statistics and therefore I tried to keep it as basic as possible since most of them haven't done any advanced math for 10-30 years.

My way of explaining calculating p-values & the tails

I start with a explaining that if we believe that we have a fair coin we know it should end up tails 50 % of the flips on average ($=H_0$). Now if you wonder what the probability of getting only 2 tails out of 10 flips with this fair coin you can calculate that probability as I've done in the bar graph. From the graph you can see that the probability of getting 8 out of 10 flips with a fair coin is about about $\approx 4.4\%$.

Since we would question the fairness of the coin if we got 9 or 10 tails we have to include these possibilities, the tail of the test. By adding the values we get that the probability now is a little more than $\approx 5.5\%$ of getting 2 tails or less.

Now if we would get only 2 heads, ie 8 heads (the other tail), we would probably be just as willing to question the fairness of the coin. This means that you end up with a probability of $5.4...\%+5.4...\% \approx 10.9\%$ for a two-tailed test.

Since we in medicine usually are interested in studying failures we need to include the opposite side of the probability even if our intent is to do good and to introduce a beneficial treatment.

My flipping coins graph

Reflections slightly out of topic

This simple example also shows how dependent we are on the null hypothesis to calculate the p-value. I also like to point out the resemblance between the binomial curve and the bell curve. When changing into 200 flips you get a natural way of explaining why the probability of getting exactly 100 flips starts to lack relevance. The defining intervals of interest is a natural transition to probability density/mass function functions and their cumulative counterparts.

In my class I recommend them the Khan academy statistics videos and I also use some of his explanations for certain concepts. They also get to flip coins where we look into the randomness of the coin flipping - the thing that I try to show is that randomness is more random than what we usually believe inspired by this Radiolab episode.

The code

I usually have one graph/slide, the R-code that I used to create the graph:

library(graphics)

binom_plot_function <- function(x_max, my_title = FALSE, my_prob = .5, edges = 0, 
                                col=c("green", "gold", "red")){
  barplot(
    dbinom(0:x_max, x_max, my_prob)*100, 
    col=c(rep(col[1], edges), rep(col[2], x_max-2*edges+1), rep(col[3], edges)),
    #names=0:x_max,
    ylab="Probability %",
    xlab="Number of tails", names.arg=0:x_max)
  if (my_title != FALSE ){
    title(main=my_title)
  }
}

binom_plot_function(10, paste("Flipping coins", 10, "times"), edges=0, col=c("#449944", "gold", "#994444"))
binom_plot_function(10, edges=3, col=c(rgb(200/255, 0, 0), "gold", "gold"))
binom_plot_function(10, edges=3, col=c(rgb(200/255, 0, 0), "gold", rgb(200/255, 100/255, 100/255)))
Max Gordon
  • 5,926
  • 8
  • 34
  • 52
  • Great answer Max - and thank you for recognizing the non-triviality of my question :) – Tal Galili Dec 01 '11 at 12:31
  • +1 nice answer, very thorough. Forgive me, but I'm going to nitpick on two things. 1) the p-value is understood as the probability of data being as extreme or more extreme as yours under the null, thus your answer is right. However, when using discrete data like your coin flips, this is inappropriately conservative. It's best to use what's called the "mid p-value", i.e. 1/2 the probability of data as extreme as yours + the probability of data being more extreme. An easy discussion of these issues can be found in Agresti (2007) 2.6.3. (cont.) – gung - Reinstate Monica Dec 02 '11 at 05:48
  • You state that randomness is more random than we believe. I can guess what you might mean by that (I haven't had a chance to listen to the Radiolab episode you link, but I will). Curiously enough, I've always told students that randomness is less random than you believe. I'm referring here to the perception of streaks (e.g., in gambling). People believe that random events should alternate much more than random events actually do, and as a result believe they see streaks. See Falk (1997) Making sense of randomness Psych Rev 104,2. Again, you're not wrong--just food for thought.
  • – gung - Reinstate Monica Dec 02 '11 at 06:00
  • Thank you @gung for your input. I've actually not heard of the mid-pvalue - it makes sense though. I'm not sure about if it's something I would mention when teaching basic statistics since it may give a feeling of loosing the hands-on feeling that I try to give. Concerning randomness we mean exactly the same - when seeing a truly random number we are fooled to think there's a pattern to it. I think I heard on the Freakonomics podcast folly of prediction that... – Max Gordon Dec 02 '11 at 16:43
  • ... the human mind has over the years learned that failing to detect a predator is costlier than thinking it's probably nothing. I like that analogy and I try to tell my colleagues that one of the primary reasons for using statistics is to help us with this defect that we're all born with. – Max Gordon Dec 02 '11 at 16:47