I have a data set with individuals with a certain diagnosis who are observed from the time of their diagnosis until death or the end date of the study. I want to calculate SMR for the whole group, and also compare subgroups (especially sex and year). My question regards whether the methodology I describe is sound, and I'd be happy if someone could provide some references (articles or books) where I can read about this specific methodology.
The data is standardised with census data and here are the first six (of 384) rows of data:
year sex age_group observed_deaths expected_deaths
2006 0 15-19 0 0.01480
2006 0 20-24 0 0.05848
2006 0 25-29 3 0.04836
2006 0 30-34 1 0.03835
2006 0 35-39 0 0.06424
2006 0 40-44 2 0.11880
Expected deaths are calculated from the census number of deaths in each year/sex/age group stratum and the person-years of observation time in each stratum.
So the basic method to calculate SMR is to divide the sum of observed deaths (O) with the sum of expected deaths (E). O/E in this case (for the full data set) is 8.68. The standard error is, to my understanding by dividing the square root of O by E, so the confidence intervals using this method are 8.19-9.18. Calculating SMR for each sex is simply done by summing observed and expected deaths for each sex separately, and performing these calculations with each pair of O and E.
So far so good, but I'd like to assess whether there is a difference between sexes as well as a difference between different years of study. If my understanding is correct, this could be done using Poisson regression. So if I start by calculating the basic SMR without taking sex or year into account:
glm(observed_deaths ~ offset(log(expected_deaths)), data=data)
This gives the same SMR at 8.68, but slightly different confidence intervals at 8.20-9.19. Calculating SMRs for each sex is easily done:
glm(observed_deaths[sex==0] ~ offset(log(expected_deaths[sex==0])), data=data)
glm(observed_deaths[sex==1] ~ offset(log(expected_deaths[sex==1])), data=data)
And now, I haven't read about this but it seems like I could just add sex as a covariate and get a statistical test for the difference in SMR between the sexes:
glm(observed_deaths ~ offset(log(expected_deaths)) + sex, data=data)
Or if I want to assess a linear effect of time on the log SMR:
glm(observed_deaths ~ offset(log(expected_deaths)) + year, data=data)
Is this methodology sound and valid? I understand that this operates on the assumption that the SMR is the same in all strata not included in the regression model, but that assumption must reasonably be implicit in the simple method (without the use of Poisson regression) as well?
Can anyone point me to some useful references where I can read more about the use of Poisson regression models when calculating SMRs?
observed_deaths ~ year + sex+offset(log(expected_deaths))– kjetil b halvorsen May 14 '20 at 11:11