I am running a Cox regression for a large (100k) dataset with 135 variables after dummy encoding.
Most of the coefficients are reasonable and the confidence intervals are not large. However, for one covariate, the confidence interval is enormous (0.1-80). For reference, the next largest upper bound is 6. I've checked the distribution of this covariate but nothing appears different from the others (i.e., no huge outliers, no strange distribution).
I decided that this information was pretty much useless and thought it might be best to simply remove this variable (especially because I think it could interfere with my results downstream too, namely that it violated the PH assumption and I'd have to add a time interaction, which I think lead to strange results because I'd be multiplying time by huge negative coefficients).
However, when I redo the analysis without this covariate, suddenly another (related) covariate is not very significant (though its coefficient is similar). My questions are:
- What could be causing such huge CIs in only this single covariate?
- Is my decision to eliminate this covariate based on such a finding just?
- What does it mean for my interpretation that now that the troublesome covariate is removed, another covariate becomes significant?
- Is my idea in the brackets of the third paragraph (starting: "especially because I think...") correct?