2

I have a dataset consisting of time periods, at the end of each one the individual either develops a disease, or doesn't and is right-censored. I suspect that the rate of developing the disease is determined by a power function of another known variable, N, a an unknown power b, and an unknown multiplier a determined by a random effect across the different individuals id.

I found this page very helpful, but I would like to use a survival model as the response variable (lhs), and a mixed non-linear model as the input (rhs).

The data look something like this (where I've made the power parameter 0.4).

library(tidyverse)
library(survival)
library(lme4)

Nsamples = 100 Z = tibble(ev = sample(0:1, size = Nsamples, replace = T), #event outcome N = rpois(n = Nsamples, lambda = 100), #known input variable dur = N^0.4*rnorm(n = Nsamples, mean = 1, sd = 0.1), #length of follow-up id = rep(letters[1:2], each = Nsamples/2) #a random effect )

I use a deriv function to define the non-linear part:

power.f = deriv(~a * N^b, namevec=c('a', 'b'), function.arg=c('a', 'N','b'))

And a survival function to define the output:

surv_obj <- with(Z, Surv(time = dur, event = ev))

Putting these together, and using the lme4::nlmer function with format Output ~ Non-linear part ~ Random effect

nlmer(surv_obj ~ power.f(a, N, b) ~ (a|id), data = Z, start=c(a=1, b=0.5))

However, I get the following error

# Error in resp$ptr() : dimension mismatch

In principle this should work I think, but I am not sure if these functions are compatible with each other and the error message doesn't mean much to me. Please help if you know how to get this to work, or some other way to fit a survival rates model like this with a power law in the input variables.

Thank you!

EdM
  • 92,183
  • 10
  • 92
  • 267
  • When you speak of "a dataset consisting of time periods," in your actual data are the time periods the same for all individuals as in panel data? Or are there separate observation times for each individual, as I understand from your code? – EdM Apr 05 '23 at 14:41
  • Thanks, they are separate observation times for each individual. – user2493870 Apr 06 '23 at 08:46

1 Answers1

1

Unless a function is designed to handle censored observations, it won't handle censored observations as coded via the Surv() function in R. The nlmer() function isn't designed to handle censored observations.

It seems that this particular problem can be solved by a log transform of the hypothesized power function:

$$\log(aN^b)=\log a+b\log N .$$

That transformation sets the log of the known $N$ as the predictor variable, provides a form linear in the unknown coefficient $b$, and allows for random intercepts in the form of $\log a$. You then can use your choice of software for modeling random effects in survival models, summarized for example in the R Survival Task View.

I suspect that more complicated hypothesized non-linear functions would benefit from a Bayesian survival model, but that doesn't seem to be needed here.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • have a request https://stats.stackexchange.com/questions/612178/interpreting-survival-analysis-plot-using-genes-as-predictor – PesKchan Apr 06 '23 at 19:12
  • Thank you, this is a good suggestion. However, I'm specifically interested in estimating a b such that the N^b term relates linearly to the hazard rate of the disease. I could achieve this if I could somehow also log-transform the survival object (not sure what that would look like), or alternatively transform the output of the model you suggest to get the value of b as it would be on a linear scale. Any idea if either of those are possible? – user2493870 Apr 13 '23 at 07:32