5

I was hoping someone could help me with this problem in the cox proportional hazards model.

I am given the following setup.

T is a non-negative random variable with continous distribution and hazard function $\lambda_T(t)$. T has density $f_T(t) = \lambda_T(t) S(t)$ and $S(t) = P(T>t)$. Also $F(t) = P(T \leq t)$ is the distribution function.

If I have $n$ observations of $T$. Note no censoring is assumed. Can anyone tell me how I arrive at an $\textbf{efficient influence function}$ for $S(t_0)$ where $t_0$ is a fixed time point.

note $\sqrt{n} ( \hat{S(t_0)} - S(t_0) ) = \sum_{i=1}^{n} \phi(T_i)$ where we have $\phi(T_i)$ is the influence function. This leads to an efficient estimator of $\hat{S}$

nalen
  • 63
  • 4
  • is there a particular text book you are working from in regards to this? – phdmba7of12 Apr 14 '21 at 18:46
  • what does $O_p$ here represent? – phdmba7of12 Apr 14 '21 at 18:51
  • $O_p$ is a term converging to zero. – nalen Apr 14 '21 at 18:54
  • No not a particular book actually. – nalen Apr 14 '21 at 18:55
  • As this is a Cox regression model, are you asking about estimating the influence of removing one event time-point on the baseline survival function? Put another way, would the individual with the event at that time point still be included at prior times (effectively censored at what was really the event time)? – EdM Apr 14 '21 at 21:54
  • @EdM sorry I believe there was an error. $t_0$ is just a fixed time point, and is in both terms of $S$. – nalen Apr 15 '21 at 11:50
  • Are you asking this in the context of a Cox model with multiple predictors? In that case, the "observation" at $T_i$ isn't just an observation of a time value but also the multi-dimensional set of associated covariates both for the case having the event and all the other cases at risk at $T_i$. Or are you asking about a simpler situation where all the information is in the event times, like with a single Kaplan-Meier curve? – EdM Apr 15 '21 at 16:06
  • @EdM Yeah the simpler situation, thank you!. I put Cox model in there, but essentially it's just a proportional hazards model without censoring. I have arrived at the Kaplan Meier estimate, but I don't know how to use the efficiency, and derive the Kaplan Meier estimate from this. I.e. I need some likelihood argument or something I think. I am to use the efficient influence function to derive the estimate, not the other way around. – nalen Apr 15 '21 at 16:16
  • @nalen even a "proportional hazards model without censoring" implies at least 1 covariate and 2 groups with different survival curves. So are you just asking about a single underlying continuous survival curve $S(t)$ estimated as $\hat S(t)$ from $n$ event times $T_i$? – EdM Apr 15 '21 at 16:35
  • @EdM Yes correct. I wasn't aware of this. – nalen Apr 15 '21 at 16:38
  • Do you think $\hat{S}(t_0)$ would be more appropriate than $\hat{S(t_0)}$? – The Pointer Apr 15 '21 at 18:13

3 Answers3

1

Write $\mathbb{P}_n$ as the empirical expectation. The estimator $\hat\psi = \mathbb{P}_n I(T>t_0)$ satisfies that $$\sqrt{n}(\hat\psi - S(t_0)) = \sqrt{n}\mathbb{P}_n [I(T>t_0)-S(t_0)] + 0,$$ showing that $\hat\psi$ is asymptotically linear with influence function $I(T>t_0)-S(t_0)$. Since the model is nonparametric, this must be the efficient influence function.

Another, direct, way to find the efficient influence function is through the canonical gradient. Let $\epsilon$ parametrize a one dimensional parametric submodel. Then \begin{align*} \frac{\mathrm{d}}{\mathrm{d}\epsilon} \mathbb{E}_\epsilon[T>t_0] \mid_{\epsilon = 0} &= \frac{\mathrm{d}}{\mathrm{d}\epsilon} \int_{t_0}^\infty f_\epsilon(s)\,\mathrm{d}s \mid_{\epsilon = 0} \\ &= \int_{t_0}^\infty f_\epsilon(s) R_\epsilon(s) \,\mathrm{d}s \mid_{\epsilon = 0} \\ &= \mathbb{E}_\epsilon [I(T>t_0) R_\epsilon(T)] \mid_{\epsilon = 0} \\ &= \mathbb{E}_\epsilon [\left\{ I(T>t_0) - S_\epsilon(t_0) \right\} R_\epsilon(T)] \mid_{\epsilon = 0} \\ &= \mathbb{E}[\left\{ I(T>t_0) - S(t_0) \right\} R_0(T)], \end{align*} where $R_\epsilon$ is the score. Since $I(T>t_0) - S(t_0)$ is mean zero, it must be the canonical gradient and the efficient influence function.

Ben
  • 894
  • How did you get the first line? Is this just an appeal to asymptotic properties of any empirical estimator that takes the form of the empirical expectation? – Casey May 10 '22 at 14:53
  • @Casey It's just rewriting the expression, since $\sqrt{n} \mathbb{P}_n S(t_0) = \sqrt{n} S(t_0) \mathbb{P}_n 1 = \sqrt{n} S(t_0)$. – Ben May 10 '22 at 15:03
  • Oh that's clever, is that usually how these arguments go? Or is there some remainder that gets pushed into the O() term? I'm trying to use this great example to gain more intuition behind the general "algorithm" for finding influence curves – Casey May 10 '22 at 15:15
0

I'm not much of an expert on influence functions; I'll start with the working definition provided in this answer by Michael Chernick: "The influence function for a parameter...essentially measures the difference between the parameter estimate when the data point is included compared with when it is left out."

In your case you want to know how removing particular event times from the observation set (maybe more precisely, making small changes in observed event times $T_i$) affect an estimate of survival at a particular time, $\hat S(t_0)$. In your situation with a non-parametric survival function estimate,* that might be the Kaplan-Meier estimate, or the survival function derived from the Nelson-Aalen estimate of cumulative hazard. So ask yourself the following questions:

If $T_i > t_0$, is $\hat S(t_0)$ affected if you omit observation $i$ or make a (small) change in its observed time?

If $T_i = t_0$ (an event perhaps of 0 probability in principle, but maybe of some practical interest), what happens to $\hat S(t_0)$ if you omit observation $i$ or make a (small) change in its observed time?

If $T_i < t_0$, what happens to $\hat S(t_0)$ if you omit observation $i$ or make a (small) change in its observation time?

The Wikipedia entry shows the derivation of the Kaplan-Meier estimate based on maximum likelihood, which might help put the above into a more formal argument.

Although you ask in the context of no censoring, also consider what happens to $\hat S(t_0)$ if there are small changes in censoring times that aren't close to $t_0$.


*Although the question was originally posed in terms of a Cox regression, discussion in comments clarified that the question is about a non-parametric estimate of a single survival curve. A "semi-parametric" Cox regression makes no parametric assumptions about the baseline hazard, with parametric modeling of covariate effects on hazard. If the "influence function" is defined in terms of small changes in observed event times with unaltered covariate values, this type of argument can be extended to Cox models. In Cox models, however, the "influence" of interest is generally in how each of $n$ individual cases, with associated covariate values, affects estimates of each of the $p$ regression coefficients.

EdM
  • 92,183
  • 10
  • 92
  • 267
0

I'll use the Gateaux derivative approach to derive the efficient influence function of the survival function $S(t)$, assuming there is no censoring.

By definition, the estimand $\Phi(P) = S(t_0) = 1 - F(t_0) = 1 - \int_0^{t_0} f(t) dt$.

The efficient influence function (EIF) is \begin{align} \phi(\tilde{t}) &= \frac{d \Phi(P_s)}{d s} \vert_{s=0} \\ &= \frac{d}{d s} [1 - \int_0^{t_0} \{s \mathbb{1}_{\tilde{t}}(t) + (1 - s) f(t)\} dt] \\ &= -\int_0^{t_0} \mathbb{1}_{\tilde{t}}(t) dt + \int_0^{t_0} f(t) dt \\ &= [1 - \int_0^{t_0} \mathbb{1}_{\tilde{t}}(t) dt] - [1 - \int_0^{t_0} f(t) dt] \\ &= [1 - \mathbb{1}(\tilde{t} \le t_0)] - S(t_0) \\ &= \mathbb{1}(\tilde{t} > t_0) - S(t_0) \end{align}

WLOG (change the notation a bit), the EIF is $\phi(t) = \mathbb{1}(t > t_0) - S(t_0)$. Further, we can show that \begin{align} \mathrm{Var}[\phi(t)] &= \mathrm{Var}[\mathbb{1}(t > t_0) - S(t_0)] \\ &= \mathrm{E}[\{\mathbb{1}(t > t_0) - S(t_0)\}^2] - \{\mathrm{E}[\mathbb{1}(t > t_0) - S(t_0)]\}^2 \\ &= \mathrm{E}[\{\mathbb{1}(t > t_0) - S(t_0)\}^2] - 0 \\ &= \mathrm{E}[\mathbb{1}(t > t_0)^2] - 2\mathrm{E}[\mathbb{1}(t > t_0)]S(t_0) + S(t_0)^2 \\ &= S(t_0) - 2 S(t_0)^2 + S(t_0)^2 \\ &= S(t_0) [1 - S(t_0)] \end{align}

The resulting EIF is exactly what you should expect without censoring. Since it's famous that the EIF for CDF is $\mathbb{1}(t \le t_0) - F(t_0)$, which suggests that the empirical CDF estimator is efficient.

Reference for the Gateaux derivative approach: Hines, Oliver, et al. "Demystifying statistical learning based on efficient influence functions." The American Statistician 76.3 (2022): 292-304.

Li Ge
  • 1