1

I am modelling differences between the risk of getting chronic kidney disease (CKD) between two groups, group A and B. I have very long follow-up time, approx. 30 years. The trouble is that CKD diagnosis rates have changed dramatically over these years, from being very rarely diagnosed to being quite commonly diagnosed. This change has also occurred at different rates in different age groups, meaning that some age groups received more intensive diagnosis earlier than other age groups. The relationship between age, calendar year, and baseline risk of diagnosis is therefore highly complex. Since my groups are slightly different in age and calendar time at recruitment, I need to adjust my analysis for this.

I am using a Cox proportional hazards model with extension for time-varying covariates. I have calendar year on the time axis and adjust for age as a spline function. The challenge is that the shape of the association between age and baseline hazard is not the same for all calendar years. Thus I started thinking about the interaction between age and calendar year. The final model therefore looks like this (to allow for different HRs above and below age 65 I have a separate interaction term for this):

$H(year_{in}, year_{out}) \sim group + group:age_{65} + spline(age_{in}):spline(year_{in}) + Z$

This model works splendidly as far as I can judge. It gives sensible estimates, it does not violate the proportional hazards assumption, and does not condition on survival to a certain age, as far as I can see. However, I have not found any sources (including here on CrossValidated) describing benefits or disadvantages of interactions between splines. So my question is:

Are there known problems with interactions between spline functions? I imagine this intuitively as having a surface in the three-dimensional age*year*baseline hazard space.

(Side note, initially I used age on the time axis. However, I noticed that the HR between group A and B was different at different ages, and thus ended up stratifying models on attained age. This, I realised, was not ideal, as the higher-age strata essentially became conditioned on survival until that aged. In the actual model I instead include both a group identifier and a group*age_group interaction, to essentially get two different HRs for the group at two different ages.)

EDIT: Clarified equation and removed second question (2. Is it a problem to use year in the function (although not as its own term, that wouldn't make sense), given that the model already has year on the time axis?) as it has been discussed here.

  • 1
    Hello, welcome to CV.SE, why not have a 2D spline? – usεr11852 Aug 14 '23 at 09:33
  • 2
    AFAIK, the only problem with interactions of splines is in interpreting the output. – Peter Flom Aug 14 '23 at 11:23
  • For some possibilities see https://stats.stackexchange.com/questions/245299/gams-test-of-simple-effects-following-a-significant-interaction/248696#248696, https://stats.stackexchange.com/questions/189384/predicting-mean-smooth-in-gam-with-smooth-by-random-factor-interaction/189437#189437, https://stats.stackexchange.com/questions/592932/how-to-apply-splines-to-logistic-glmm-if-predictors-are-scaled/594230#594230 – kjetil b halvorsen Aug 14 '23 at 14:02
  • @usεr11852 Thanks! I have not seen any examples of 2D splines applied in a regression setting. Do you have any suggestions of such examples? – Themistokles Aug 14 '23 at 16:14
  • @PeterFlom This sounds promising. Since I am only using them for covariate adjustment in this case, explicability is not a priority as long as I can be convinced that they are doing what I think they are doing. – Themistokles Aug 14 '23 at 16:15

0 Answers0