1

David Spiegelhalter in Art of Statistics states the below analysis would be problematic. Can anyone give an example (real or simulated) of how this would be a problem?

Measuring two groups at baseline and after an intervention, and saying the groups are different if one is significantly changed from their baseline, and the other group's change is not significant. The correct procedure is to carry out a formal statistical test of whether the groups differ - this is known as a test of interaction.

luciano
  • 14,269
  • This seems to be alluding to [tag:difference-in-difference]. – Dave Feb 20 '23 at 14:21
  • I think this point is also in part about the dangers of mis-interpreting the result of a significance test as "p < 0.05" means "there is a before/after difference so the treatment has an effect" and "p > 0.05" means "there is no before/after difference, so the treatment has no effect." – dipetkov Feb 20 '23 at 15:03
  • 1
    I suspect what it's getting at is that it is better to fit a single model taking into account both the effect of Time and the effect of Group. Rather than, say, conduct two separate t-tests. – Sal Mangiafico Feb 20 '23 at 15:40
  • See for a similar phenomenon: https://stats.stackexchange.com/questions/436403/is-the-difference-between-significant-and-not-significant-significant also https://stats.stackexchange.com/questions/469737/can-a-variable-have-a-significant-effect-on-an-effect-that-is-non-significant-it – kjetil b halvorsen Mar 02 '23 at 19:04

1 Answers1

1

When testing two or more hypothesis simultaneously, there is always the risk of inconsistent results. Consistency between results of different tests is simply not a part of the theory of hypothesis testing! This has been discussed here before, see

and links therein, for instance The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant.

In your case you want to ask one question of the data, if the intervention has an effect or not. That is properly tested with an interaction, as you say (for details Best practice when analysing pre-post treatment-control designs). Testing separately for the treatment and control groups if there is a change from baseline is different, there might for instance be a difference from baseline in both groups , simply because there is a change with time ... a test of the interaction looks at if the change from baseline is different between the groups, and so will work even if there is a trend with time in both groups.