1

I am running (multi-level) logit models on hospital data testing whether the ratio two hospital tariffs has any effect on the probability of being admitted to the hospital.

My models are the following:

f1 <- glm(admit ~  log_ratio, data=dat, family = "binomial")
f2 <- glm(admit ~  log_ratio*as.factor(Condition2), data=dat, family = "binomial")
f3 <- lmer(admit ~  log_ratio + (1|provider_id), data=dat_first, REML = F)
f4 <- lmer(admit ~  log_ratio + log_ratio:as.factor(Condition2) + (1|provider_id), data=dat, REML = F)
f5 <- lmer(admit ~  log_ratio*as.factor(Condition2) + (1|provider_id), data=dat, REML = F)

where admit is a binary variable indicating hospital admission, log_ratio is the log-transformed ratio of the two tariffs, Condition2 is the condition for which the patient received treatment and provider_id is the id of the treatment provider.

My results are the following:


============================================================================================================
                                                                   Dependent variable:                      
                                              --------------------------------------------------------------
                                                                          admit                             
                                                     logistic                         linear                
                                                                                  mixed-effects             
                                                  (1)         (2)         (3)          (4)          (5)     
------------------------------------------------------------------------------------------------------------
log_ratio                                      0.605***     1.385*      0.012***     0.013***      0.012    
                                                (0.058)     (0.808)     (0.001)      (0.002)      (0.013)

as.factor(Condition2)Brain Disorder 1.392 -0.001
(1.304) (0.021)

as.factor(Condition2)Amputation 2.352 0.0002
(2.257) (0.072)

as.factor(Condition2)Chronic pain 4.403** -0.026
(2.161) (0.055)

as.factor(Condition2)Nervous system 2.391 0.006
(1.551) (0.026)

as.factor(Condition2)Organ Disorder 1.114 -0.018
(1.787) (0.043)

log_ratio:as.factor(Condition2)Brain Disorder -0.098 0.014*** 0.014
(0.825) (0.002) (0.014)

log_ratio:as.factor(Condition2)Amputation -0.067 0.053*** 0.053
(1.741) (0.003) (0.059)

log_ratio:as.factor(Condition2)Chronic pain -1.926 0.009*** 0.024
(1.230) (0.001) (0.030)

log_ratio:as.factor(Condition2)Nervous system -1.537 0.005*** -0.0003
(1.050) (0.001) (0.018)

log_ratio:as.factor(Condition2)Organ Disorder 0.457 0.023*** 0.037
(1.242) (0.001) (0.032)

Constant -4.476*** -6.550*** 0.022*** 0.007 0.008
(0.075) (1.293) (0.008) (0.009) (0.023)


Observations 115,376 115,376 115,376 115,376 115,376
Log Likelihood -12,589.260 -12,229.590 56,636.920 56,935.280 56,935.630 Akaike Inf. Crit. 25,182.530 24,483.190 -113,265.800 -113,852.600 -113,843.300 Bayesian Inf. Crit. -113,227.200 -113,765.700 -113,708.100 ============================================================================================================ Note: p<0.1; p<0.05; **p<0.01

My main interest is in log_ratio and whether it has any association with the probability of hospital admission. My secondary interest is whether this correlation is different per condition. Log-ratio is significant in every model (logit or multi-level logit) except for when I do an interaction effect with condition (with all main effects in the model, such as in Models 2 & 5). The variables with log-ratio then lose their significance.

My question is which model do I believe? My gut feeling is that log-ratio is in fact significant, but due to small sample size per condition or some other reason it is not showing up as significant in Models 2 & 5. Could this be true?

Also, could Model 4 be an acceptable specification of the model? That is, in my case, do I need to have all main effects in the model?

  • 1
    The question of “which” model you believe is not straightforward. You can compare some of the models using the anova() function. You could keep a hold out sample and test each model on the out-of-sample set. If your interaction term is significant then you probably keep it in the model and probe for which conditions the association is significant. – Matt Barstead Dec 26 '20 at 01:07

1 Answers1

0

Building off Matt Barstead's comment, looking at the fit information of model 3 vs model 4, I see no evidence that model 4 is the better model. The log likelihood is higher (with 5 additional degrees of freedom used), both AIC and BIC are higher, suggesting that model 4 is not doing a better job of prediction. You can use R's anova() function to test the two models. A quick and dirty calculation suggests that model 4 does not improve fit:

> 1-pchisq((-2*56636.920)-(-2*56935.280), 5)
[1] 0

Because the log likelihood of the more parsimonious model is smaller than the log likelihood from the less parsimonious model in your case, the chi-square test value of 0 is saying the more complicated model is no better than the less complicated model. See information on these tests here.

Erik Ruzek
  • 4,640
  • Thanks Erik and Matt for your comments. If I understand correctly this would mean that dividing my sample by condition and testing the relationship by condition is irrelevant, indicating that all conditions have a similar correlation with log_ratio variable. Is that the case? – Stata_user Dec 27 '20 at 11:13
  • Yes, but a clarification. It would be saying that all conditions have a similar admittance rate, after adjusting for log ratio. Further log ratio does not appear to modify the admittance rates for these conditions (interaction model). – Erik Ruzek Dec 28 '20 at 16:11
  • Thanks @Erik Ruzek. – Stata_user Dec 29 '20 at 17:22