This question is a follow up to one of my previous questions asked on this site. The goal was to create a composite score for biomarkers related to a binary outcome and then use that in a regression to see if the composite score can significantly predict the outcome. I had 30+ biomarkers and I ended up selecting 4 of them which were bivariately ($p<0.10$) related to the outcome. I made a composite of these 4 biomarkers using ridge regression following the helpful answer by EdM. That way I could account for the natural correlation present among these markers and get adjusted $\beta$'s (adjusting for other biomarkers and covariates like age, sex, etc.). I had 109 complete observations. The coefficients look as follows:
> ridge.mod.bestlam <- glmnet(x, y, alpha = 0, lambda = 0.2387845, standardize = TRUE, intercept=TRUE)
> coef(ridge.mod.bestlam)
10 x 1 sparse Matrix of class "dgCMatrix"
s0
(Intercept) -0.0252900970
Age 0.0003756038
female 0.0603410625
Premorbid_depression -0.0338846415
antidep12 0.0556264177
nGCS_Bestin24 0.0135018439
log_med_IL_10 0.0530590200
log_med_ITAC 0.0478298328
log_med_sIL_6R -0.0881823906
log_med_RANTES 0.0568835030
I multiplied the last 4 coefficients with the respective (scaled) marker values and obtained the composite score that I'd call ILS.ridge here. I used it as an input in a final logistic regression model. The odds ratio was 423.3499, extremely high. I must be doing something wrong but cannot figure it out. I checked the VIF and it was well below 1.5 for all variables. I also provide with the final regression results here.
glm(formula = nPTDCategory_m12 ~ Age + factor(female) + factor(nGCS_Bestin24) +
factor(Premorbid_depression) + factor(antidep12) + ILS.ridge,
family = "binomial", data = data2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0708 -0.6266 -0.4577 -0.2850 2.6085
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.5892763 2.6980108 1.701 0.08895 .
Age -0.0008613 0.0170169 -0.051 0.95963
factor(female)1 0.4465424 0.6081925 0.734 0.46282
factor(nGCS_Bestin24)1 -0.0261555 0.6160321 -0.042 0.96613
factor(Premorbid_depression)1 -0.7174396 0.8567616 -0.837 0.40238
factor(antidep12)1 0.7393719 0.6429819 1.150 0.25018
ILS.ridge 6.0481991 2.3258686 2.600 0.00931 **
> exp(6.0481991)
[1] 423.3499
I'd like to know your thoughts about this problem. Can anyone tell if I'm doing something wrong?
ridge.mod.bestlamseems to be based on linear rather than logistic regression withglmnet. If that's not an error in copying then that could contribute to your problem. I'm also curious what the distribution ofILS.ridgevalues was. As a continuous predictor its reported coefficient for the logistic regression would be for a change of 1 full unit, so ifILS.ridgeonly varies over a range of, say, +/- 0.01 then this result might make sense. As @FrankHarrell put it: "make sure the odds ratio is computed over a valid range of X such as its quartiles." – EdM Sep 03 '19 at 19:51