EDIT: It was a simple typo (+ instead of *), but the values still aren't quite right. Leaving the original in case others find it useful
I'm fairly new to working with survival functions so this might just simply be a misunderstanding of how the function works. I've been tasked with showing a manual calculation for Cox deriving the survival probability from the hazard function (so even though it exists, I can't just call the built in function). My understanding is:
The survival function should be derivable from the hazard function via
S(t)=exp{−∫h(x)dx} --from time = 0 to time = t
and the hazard function at any given time can be found using
h(t)=h0(t)*exp(b1x1+b2x2+...+bpxp) - where b is the coefficient for each covariate (first column)
I've used the lifelines package in Python to fit a cox-regression model to my data and gotten the covariates
coef exp(coef) ...
DM 0.04 1.04
HF 0.50 1.65
PVD 0.29 1.34
DEMENTIA 0.31
AGE_AT_MOD_START
MODALITY_NON-OPTIMAL START HD -0.69 0.50
MODALITY_OPTIMAL START HD -1.36 0.26
MODALITY_OPTIMAL START PD -1.24 0.29
And it has a baseline_hazard and baseline_cumulative_hazard function that can be called to get h0(t).
Lifelines has a built in function for calculating survival (predict_survival_function) which I've been using to check my work, but my answers have been quite off. I created an example patient and basically just plugged it into the h(t) equation above.
baseline hazard
0.0 0.000000
1.0 0.000000
2.0 0.000000
3.0 0.000000
4.0 0.000101
... ...
1822.0 0.077646
1823.0 0.077646
1824.0 0.077646
1825.0 0.000000
1826.0 0.000000
DM HF PVD DEMENTIA AGE_AT_MOD_START MODALITY_NON-OPTIMAL START HD MODALITY_OPTIMAL START HD MODALITY_OPTIMAL START PD
0 1 0 1 0 85 1 0 0
h(4) = 0.000101+exp(0.04*0+0.50*1+0.29*0+0.31*1+0.04*85+....-1.24*0) = 20.90534 (uh oh)
I tried with cumulative baseline hazard as well, since baseline hazard is 0 for the days where no deaths occured, though my understanding is that the hazard formula just uses the baseline hazard, not the cumulative one. I'm positive the problem is inside the e^(covariates*variables) part- even just checking
h(0) = 20.90524
S(0) = 8.3362e-10
Is clearly wrong, since S(0) should be close to 1.0. The glaring potential offender is that I just plugged in age (85) directly since it was already numerical, but should all values be normalized or something? I've looked all over but I haven't seen an example of anyone doing the calculations, it's all just 'call a function' nowadays (which, fair, I'd do the same if I was allowed). Any insight would be appreciated.
For comparison, the build in survival function results
4.0 0.999724
6.0 0.999448
8.0 0.998897
9.0 0.998344
10.0 0.997792
... ...
1820.0 0.155034
1821.0 0.155034
1822.0 0.125425
1825.0 0.125425
1826.0 0.125425
h(4) = 0.000101+exp(0..., do you meanh(4) = 0.000101* exp(0? Note the*– Cam.Davidson.Pilon Jun 09 '20 at 22:05