6

This situation often arises when I'm running tests on healthcare data for my job.

Taken individually, the hypothesis test $p$-values seem to show no statistically significant relationship between $X_1$, $X_2$, . . . , $X_k$ and $Y$.

However, taken as a family of tests there is a clear relationship between $X_1$, $X_2$, . . . , $X_k$ and $Y$, indicating the variables are not independent.

Should I be running a "meta" hypothesis test on the results of the initial $k$ hypothesis tests?

utobi
  • 11,726
RobertF
  • 6,084
  • 1
    Related? https://stats.stackexchange.com/questions/243003/can-a-meta-analysis-of-studies-which-are-all-not-statistically-signficant-lead/243014#243014 – Christoph Hanck Feb 06 '23 at 15:50
  • 1
    When you say "taken individually," do you mean that you ran completely separate tests of the association of $Y$ with each of $X_1$, $X_2$, etc.? Or did you do evaluate the relationship of $Y$ with all of $X_1$, $X_2$, etc. together as in a multiple regression? – EdM Feb 06 '23 at 16:53
  • It sounds like what you want is a more comprehensive model that includes all the data. This can usually assess the overall effect of X vs. Y, but also tease out the effect of X1/Y1 vs. X2/Y2. But I don't understand the structure of your data. – Sal Mangiafico Feb 06 '23 at 17:06
  • @EdM Separate hypothesis tests. Comparing $Y$ in different levels of binary $X$ variables (e.g., sex, age <70, age >= 70 years, etc.). – RobertF Feb 06 '23 at 17:21
  • The point of a meta analysis is to combine results from $(X_1, Y_1), (X_2, Y_2), \ldots, (X_k, Y_k)$ where each $i = 1, 2, \ldots, k$ is an experiment or study measuring the same $X$ and the same $Y$. For the extant work in this area, key assumptions are that the studies are independent, and homogeneous. Having effects in the same direction is very different from homogeneity. – AdamO Feb 06 '23 at 17:39
  • 1
    It sounds like you would want a multiple regression analysis including Sex, Age, and so on. ... I'm not sure how you came to the conclusion: taken as a family of tests there is a clear relationship between X1, X2, . . . , Xk and Y, indicating the variables are not independent. – Sal Mangiafico Feb 06 '23 at 17:48
  • @SalMangiafico That might be an interesting separate question - is linear multiple regression preferable to multiple comparison hypothesis tests? I'm familiar with using regression models for prediction or causal inference but not descriptive. Couldn't we encounter the same issue however: p-values for the regression coefficients are all non-significant yet the coefficients are all negative or positive? – RobertF Feb 06 '23 at 19:33
  • 1
    Yes, it's certainly possible that all the p-values will indicate non-significant results. ... How would know that all the coefficients have the same sign ? I mean, if Sex = Male is coded as 0 or Sex = Female is coded as 0, the sign of the test statistic will flip. But either coding would make sense when thinking about it in relation to e.g. Age. – Sal Mangiafico Feb 06 '23 at 19:51
  • @SalMangiafico Good point - sign results are arbitrary for nominal categories, more important for ordinal categories. – RobertF Feb 06 '23 at 21:11
  • 1
    This type of multiple single-predictor modeling is fraught with difficulties. An advantage of multiple regression (in addition to minimizing omitted-variable bias) is that you get an overall estimate of whether the set of predictors as a whole is associated with outcome Y. If so then the "statistical significance" of individual coefficient estimates isn't very important, particularly if you are using the model for outcome prediction. – EdM Feb 07 '23 at 18:31
  • @EdM Let's say I'm asked to run descriptive statistics on my data - not necessarily prediction or causal inference, rather measuring strength of associations between the variables and $Y$. Is there an advantage to multiple regression v. multiple comparisons? I suppose one plus is that multiple regression can identify Simpson's Paradox scenarios (https://en.wikipedia.org/wiki/Simpson%27s_paradox). – RobertF Feb 08 '23 at 13:35
  • 1
    "Strengths of associations" evaluated one at a time can be misleading. An apparently strong association of one predictor with outcome might come from its correlation with a much stronger predictor. An apparently weak association might come from omitted variable bias and you could find a stronger association if you included other outcome-associated predictors with it in multiple regression. See this page about controlling for other variables. – EdM Feb 08 '23 at 14:48

1 Answers1

5

The main point is that you cannot reframe your hypothesis based on the data you have already observed. The results will never generalize to another sample.

The "sign" of the trend for each hypothesis shouldn't matter theoretically. What we care about is the correlation between tests; in the case that tests are highly correlated, we know a Bonferroni correction would be conservative. Effects that are of opposite signs in a sample can come from probability models where the tests are highly positively correlated, or in fact any scenario can be dreamt up here.

But, alas, you didn't apply a Bonferroni correction! You would need to compare p-values to the $0.05/k$ alpha level! There is basically nothing you could have done to conserve the familywise error rate (FWER) and find a significant result. The FWER is a well defined operating characteristic of multiple testing. When you refer to "tak[ing] a family of tests", testing each hypothesis at the overall $\alpha$ level is already an anti-conservative approach - the actual false positive error rate is higher than stated, i.e. it is statistical cheating, or "p-hacking".

Based on this, you should report the results as-is and be done with it!

AdamO
  • 62,637
  • Thanks AdamO. If I had conducted the multiple hypothesis tests on a subset of my original data (say a 50% random sample) & subsequently found interesting patterns in the t-tests or p-values, I could run an additional hypothesis test on the holdout data, correct? – RobertF Feb 06 '23 at 16:39
  • 1
    @RobertF it's a controversial point. Personally, I don't agree with it. With strong assumptions, it's ok. The features of the study design itself are often subject to age, period, and cohort effects - surely you see this all the time in cohort studies. So if you do a 50/50 split sample validation, your results tend to generalize only to your analaysis capture, e.g. claims records from 2005-2015 in subscribers utilizing healthcare at X... of course, modeling these trends directly can do a lot to improve external generalizability. – AdamO Feb 06 '23 at 16:44