1

I've been doing the univariate analysis for a retrospective cohorts study. In this variable in particular, none of the no-cured patients had a growth of grampositive bacteria in their culture. Since there's a 0 in one of the boxes, I thought the best test was Fisher-Freeman-Halton's exact test (since the table is larger than 2x2). However, when I try to do it in SPSS, it says that 0 boxes expected a count less than 5 and that the minimum expected count is 6.62 (which is not what I observe). The chi-square p-value for this variable is 0.005, and the Fisher-Freeman-Halton's exact test p-value is 0.001.

I would like to do a multivariate analysis including all the variables that were significant in the univariate analysis performing a binary logistic regression analysis, but since there's a 0, I wouldn't be able to obtain the adjusted OR for the Grampositives.

Any ideas of what would be the best approach? Thanks in advance!

enter image description here

user406570
  • 11
  • 2
  • Then, since I observe that one box contains a 0, should I take the Chi-Square or the Fisher? It would make sense to add this variable to the multivariate analysis? – user406570 Feb 25 '24 at 19:44

1 Answers1

5

My answer to your initial question is that you shouldn't worry about the message, because (1) it is telling you that the minimum expected count is greater than 5 (the table in your question shows observed counts), which is the usually accepted cutoff for when it's OK to do a chi-squared test and (2) even if it were below 5, an exact test like Fisher's isn't harmed by this. (FET makes fixed-margin assumptions that may or may not be problematic, but you probably don't need to worry about this ...)

However, I have more to say about your analysis:

  • since your three columns are "total", "cured", and "not cured" (the first is the sum of the second and third), it really doesn't make any sense to analyze this as a 4x3 table (where there are 4 predictor [row] categories and 3 response [column] categories). It would be better to analyze a 4x2 table with only "cured" and "not cured".
  • Although it's a very popular approach, I would not recommend modeling by first screening for significant predictors in a series of univariate analyses and then incorporating the significant variables in a single multivariate analysis. I would only do this kind of two-stage process if I had a very large data set, so large that it wasn't computationally practical to run the model with all of the predictors, and if I only cared about prediction rather than inference. Otherwise, this is a recipe for overfitting and overconfidence (confidence intervals that are too narrow and p-values that are too small).
  • My suggestion would be to fit a single multivariate logistic regression with a reasonable subset of predictors selected a priori, and to read Chapter 4 of Frank Harrell's Regression Modeling Strategies about modeling strategies for analyses that are focused on exploration, prediction, or inference.
  • your Gram-positive observations (18 cured, 0 not cured) will indeed be a problem, leading to complete separation in a logistic regression; you can read about possible solutions here or here (although I'm not sure the best solutions will be available in SPSS ...)
Ben Bolker
  • 43,543