1

I have seen the following post. How much do we know about p-hacking "in the wild"? However, I do not know how bad if I do the following. An experiment with one factor controlled is carried out to test difference between treatments.

Suppose there is a one way ANOVA study and along which I collected some other covariate information. Assume that I do not have free money to conduct two way or higher ANOVA study. Suppose first naïve simple one way ANOVA did not show significance. However, according to previous studies, the result should be significant for treatments but difference between treatments is not known. Then I decided to add covariates and found significance. However, difference across levels were not found(with multiple testing adjusted). Then I decided to use a more complex model with same number of covariates and account outcome structure. The treatment is still significant. However, the difference across levels were not found as well(with multiple testing adjusted).

Q1: Since I have covariated ANOVA and complex model reaching the same conclusion that there is no difference across levels, should I say that 'I am p-hacking' in significance of treatment effect?

Q2: How much confidence should I place on naive model, covariated model and complex model? I do not think naive model is correct as subject might be too complex, controlling single factor is far from sufficient and the naive model does not reflect past consistent result as well

user45765
  • 1,416
  • In general, I would say this whole approach is misguided because so many of the decision/inference steps appear to depend on p-values. Learning from data is more complex than this. – mkt Jul 19 '22 at 14:32
  • @mkt Then suppose you are in this setting. Would you covariate? Say I have control group and a bunch of treatment groups. I can take difference between control and treatment do one way anova on difference assuming homogeneity of population. Test treatment effect takes 1 single step either LRT/F-test. Null from one-way anova gives reason to consider adjusting covariates. When one is building model, the question is how many covariates do you want to select while checking model diagnostics. What is your approach for this simple problem? – user45765 Jul 19 '22 at 14:40
  • You're focussed on the covariate selection, but there are more fundamental issues to consider. What is your goal? What are these treatments? – mkt Jul 19 '22 at 14:48
  • @mkt The goal is to assess treatment difference. I am not sure why covariate selection is focused. Sorry, this is my conjured thought. Treatments are active treatments vs control. There is no clear indication on how to "select" model. There would be an pre study model which may fail. If it fails, what should one do? Stop considering further model building? One could do selection covariates on a training set. However, data might be expensive. So there is no training test development split here. – user45765 Jul 19 '22 at 14:52

0 Answers0