40

Today, at the Cross Validated Journal Club (why weren't you there?), @mbq asked:

Do you think we (modern data scientists) know what significance means? And how it relates to our confidence in our results?

@Michelle replied as some (including me) usually do:

I'm finding the concept of significance (based on p-values) less and less helpful as I continue in my career. For example, I can be using extremely large datasets so everything is statistically significant ($p<.01$)

This is probably a stupid question, but isn't the problem the hypothesis being tested? If you test the null hypothesis "A is equal to B" then you know the answer is "No". Bigger data sets will only bring you closer to this inevitably true conclusion. I believe it was Deming who once gave an example with the hypothesis "the number of hairs on the right side of a lamb is equal to the number of hairs on its left side." Well, of course it isn't.

A better hypothesis would be "A does not differ from B by more than so much." Or, in the lamb example, "the number of hairs on the sides of a lamb does not differ by more than X%".

Does this make sense?

amoeba
  • 104,745
Carlos Accioly
  • 5,025
  • 4
  • 28
  • 29
  • Testing of mean equivalence (assuming that is what you want) can be in some cases simplified to a test of significance of their mean difference. With a standard error for this difference estimate, you can do all sorts of testing of the "not differ from B by more..." sorts. 2) As for the sample size -- yes, for large ss the importance of significance diminishes, but it is still crucial for smaller samples, where you cannot just generate additional values.
  • – Ondrej Feb 24 '12 at 17:02
  • 11
    Re "Of course it isn't." At a guess, a lamb has on the order of $10^5$ hairs on each side. If there are an even number of such hairs and they are distributed randomly with equal chances on both sides and the sides are clearly delineated, then the chance that both numbers are exactly equal is 0.178%. In a large flock of several hundred, you should expect to see such a perfectly balanced lamb born at least once each decade (assuming an even number of hairs occurs about 50% of the time). Or: just about every old sheep farmer has had such a lamb! – whuber Feb 24 '12 at 17:39
  • More seriously (and this goes to the heart of the issue): how do you come up with the value of $X$? – whuber Feb 24 '12 at 17:41
  • 1
    @whuber It is determined by the purpose of the analysis. A better analogy would be what is the minimum effect size that would justify further investment in a drug following a trial. Just the existence of a statistically significant effect is not enough, as developing a drug is expensive and there may be side-effects that need to be considered. It isn't a statistical question, but a practical one. – Dikran Marsupial Feb 24 '12 at 17:47
  • That's one answer, Dikran, but it is not generally available and in many instances would have an element of arbitrariness to it. What's the point of testing such a hypothesis, anyway? If $X$ is thought of as a minimum effect size to justify further investigation, then we have enough information to frame this as an optimization problem (balancing the cost of estimating $X$ against the costs and consequences of future actions based on the estimate), not a test of hypothesis. – whuber Feb 24 '12 at 18:32
  • @whuber I think you typed that as I was framing my answer below. :) – Michelle Feb 24 '12 at 18:41
  • 2
    @whuber I suspect that in most applications where there is no practical information for deciding minimum effect size of interest, then the standard hypothesis test is fine, for example testing for normality. As a Bayesian I would agree with the view as an optimisation problem rather than an hypothesis testing problem. Part of the problem with hypothesis tests results from the statistics cookbook approach, where tests are performed as a tradition without properly considering the purpose of the exercise, or the true meaning of the result (all IMHO of course). – Dikran Marsupial Feb 24 '12 at 19:27
  • 1
    @DikranMarsupial isn't the key there that the students are being taught tests by rote, as identified by gung below, rather than the importance of good study design? Would more of an emphasis on study design help solve some of the problem - not necessarily with big data sets? – Michelle Feb 25 '12 at 03:16
  • @Michelle, yes, I agree. I think a large part of the problem is that a correct understanding of frequentist hypothesis tests and confidence intervals is very subtle and counter-intuitive. This means it ends up being easier to teach non-statisticians (e.g. most scientists) the cookbook approach rather than to teach a deeper understanding. I suspect there isn't sufficient room in the undergraduate syllabus for most scientific subjects to teach these things as they should be taught. – Dikran Marsupial Feb 25 '12 at 16:18