There's already an excellent post on standardizing values used for regression here. In this post, I am using Cox Regression (Cox Proportional Hazard model) with the lifelines Python library. Now, one way to measure the model's performance is with concordance.
I noticed that if I use the data as it is, I get a concordance score of 0.75. But if I use MinMaxScaler, then my concordance score is boosted up to 0.88. Although the magnitude of the coefficients in both data sets are different, the directions (positive or negative coefficients) are still the same.
Any ideas why re-scaling values to [0, 1] would improve "performance" so much? Please note that my covariates are in the range [0, d], where d is a possibly unbounded amount (not +infinity, of course).
_log_likelihoodis the partial log likelihood (I should change the name to reflect that). Fitting and prediction is sensitive to outliers/extreme values (because there are exponentials in the mode), so scaling makes a lot of sense to help control this.Plus I really should make the search more appealing!
– Cam.Davidson.Pilon Jul 13 '17 at 00:34