4

I ran an analysis using a Proportional Hazards Cox regression model. My variable of interest is exposure to pesticides (kg of pesticides applied within 500m of home). We do not expect a linear relationship to the outcome (death) and we have therefore categorized our exposure variable into "no exposure", "low exposure", "medium exposure" and "high exposure". We are interested in the hazard ratios (HR) of the "exposed" categories compared to the "no exposure" category. I therefore ran the model as such (in R):

mod <- coxph(Surv(time,status) ~ exposure + strata(age) + [other covariates], data=mydata)

Naturally, I obtain an estimate for each of the categories and the respective confidence interval. I see there is a trend in the categories, as we had expected ( HR(high) > HR(medium) > HR(low) ).

         HR   95% CI
low      1.02 [0.99, 1.03]
medium   1.04 [1.02, 1.05]
high     1.08 [1.07, 0.10]

(these are made up results, as I am not allowed to share the true results yet)

Now my supervisor has asked me to calculate the p for trend of these results. So, we want to assess the significance of this trend "high > medium > low" we see. I think this means that there is a monotonic trend that is statistically significant from a horizontal line or a zig-zag shape. I have seen in epidemiological papers reports of the p-trend of p for trend, but I have no idea how to calculate that! Is there a function in R that calculates it? Or is it a "by hand" sort of calculation? Or do I have to run a model with the exposure variable as continuous, or a model where the categories are strata?

Thank you for any help that you can provide, I am quite confused on how to proceed!

1 Answers1

1

We do not expect a linear relationship to the outcome (death) and we have therefore categorized our exposure variable into "no exposure", "low exposure", "medium exposure" and "high exposure".

That's a mistake. You can handle a suspected non-linear relationship with methods like regression splines, using up no more degrees of freedom than the 3 you used up for defining the categories. Such continuous modeling avoids the implicit assumption that there are sudden jumps in association with outcome at the categorization break points.

Pursuing p-value trends is likely to get you into trouble, as p-values represent combinations of estimated effect magnitudes, variances of those estimates, and numbers of cases. Directly modeling the non-linear association is the way to proceed.

EdM
  • 92,183
  • 10
  • 92
  • 267