1

I've been examining fitting the Weibull and lognormal distributions with the survreg() function of the survival package. Fitting the Weibull distribution took some transformation for standard parameterization (per R dweibull()) as shown here: How to generate multiple forecast simulation paths for survival analysis?

I'm now moving on to the exponential distribution. [See https://stats.stackexchange.com/questions/616351/how-to-assign-reasonable-scale-parameters-to-randomly-generated-intercepts-for-t for an example of the exponential distribution.] Could someone please confirm if the exponential distribution is being correctly fit in the R code posted at the bottom and as illustrated in the following image? If not, how do I correctly fit exponential? I only use the lung dataset for ease of example even though it doesn't provide good fit: Weibull provides the best fit.

enter image description here

Code:

library(survival)

time <- seq(0, 1000, by = 1)

fit <- survreg(Surv(time, status) ~ 1, data = lung, dist = "exponential")

survival <- 1 - pexp(time, rate = 1 / fit$coef)

plot(time, survival, type = "l", xlab = "Time",ylab = "Survival Probability",col = "red", lwd = 3) lines(survfit(Surv(time, status) ~ 1, data = lung), col = "blue") legend("topright",legend = c("Fitted exponential","Kaplan-Meier" ),col = c("red", "blue"),lwd = c(3, 1),bty = "n")

  • 1
    The red curve is obviously wrong. Consider starting with predict.survreg. – whuber May 11 '23 at 14:30
  • See EdM answer below which completely resolves the wacky red line issue. – Village.Idyot May 11 '23 at 17:40
  • 3
    Thank you. I had determined that initially -- it's easy to guess that's the problem and to check it -- but as a general proposition you ought to look at what the predict method does when you're learning your way around any regression procedure in R. – whuber May 11 '23 at 19:39

1 Answers1

2

You've gotten trapped by location-scale modeling again. The model you fit is:

$$\log(T)\sim \beta_0 + W, $$

where $\beta_0$ is your fit$coef (location) and $W$ represents a standard minimum extreme value distribution. The scale factor multiplying $W$ for a corresponding Weibull model is set exactly to 1 for an exponential model.

Thus $\beta_0$ represents a value in the log scale of time. For linear time, you need to exponentiate it to get the rate argument to supply to pexp().

1/exp(fit$coef)
# (Intercept) 
# 0.002370928 

Try that.

EdM
  • 92,183
  • 10
  • 92
  • 267