1

This question involves both math and coding in R. Apologies if this should be on Stack Overflow, but I decided statisticians would ultimately provide better support.

I have a simple likelihood that infers parameters $R_0$ and $k$ using a vector of integers that represents the sum of values ("clusters", denoted $y$). The original likelihood was simply:

(1) $L(R_0,k|y)=\prod_{i=1}^{\infty}P(Y=y)^{y_i}$

Some the integers represent values that are now 'censored' (i.e. truncated), and thus some of the observed data should be at least size $y$. I would like to include this in the likelihood, so that the likelihood of $R_0$ and $k$ when $y_a$ values have been fully observed and $y_b$ values have been censored is:

(2) $L(R,k|y_a,y_b)=\prod_yP(Y=y)^{y_a}P(Y≥y)^{y_b}$

I would like assistance in extending this likelihood function in r and ensure it is mathematically and computationally correct.

Originally, my data consisted of a single a vector of clusters and ran through a custom function (original code is at end, including code to generate sample data).

I am unsure how to extend this to include $P(Y≥y)$. My data are now in a two column matrix, with the first column the integer values, and the second column indicating censorship status (1=censored, 0=not censored).

I have tried modifying the function to split apart the Y column into either "full" values and "censored values". I assume $P(Y≥y)=1-\sum^{y-1}_{i=1}P(Y=y)$. One attempt was to create a separate calc_sizes for full cluster and censored clusters (denoted b), where censored clusters subtracted one from the cluster size (i.e. calc_sizes.b<-unique(c(1,b-1)) to account for the $\sum^{y-1}_{i=1}$ aspect, but I am unsure if this is correct, and where or how to include the $1-f(x)$ function.

This initially seemed like a simple task, but I am finding it very challenging. I have never attempted custom likelihoods before, so any advice would be much appreciated.

##### generate sample data 
##### data generating function
bp <- function(gens=20, init.size=1, offspring, ...){  
  Z <- list() #initiate the list
  Z[[1]] <- init.size # set the first position of the list as the 
                      # number of index cases
  i <- 1 
  while(sum(Z[[i]]) > 0 && i <= gens) { 
    Z[[i+1]] <- offspring(sum(Z[[i]]), ...) 
    i <- i+1 
  } 
  return(Z)
}
##### generate vector of 10000 with true R0=0.9 and k=0.25
set.seed(123)
Y <- unlist(lapply(replicate(n=10000, bp(offspring=rnbinom, mu=0.9, 
       size=0.25)), function(x) sum(unlist(x))))

Original function to calculate likelihoods (equation 1)####

clust.likelihood <- function(Y, k, r0){ nb.likelihood <- function(x){ lgamma(kx+(x-1))-(lgamma(kx) + lgamma(x+1))+(x-1)log(r0/k)- (kx+(x-1))*log(1+r0/k) } calc_sizes <- unique(c(1, Y)) likelihoods <- c() likelihoods[calc_sizes] <- sapply(calc_sizes, nb.likelihood) sexpl <- sum(exp(likelihoods), na.rm=TRUE) if (sexpl<1) { maxl <- log(1-sum(exp(likelihoods), na.rm=TRUE)) } else {maxl <- -Inf} likelihoods <- c(likelihoods, maxl) cluster_likelihoods <- likelihoods[Y] result <- sum(cluster_likelihoods) return(result) }

function to estimate parameters####

param.est <- function(y){ ret <- c() r0_hat <- 1-(1/mean(y)) if (r0_hat < 1e-9) {r0_hat <- 1e-9} t <- optim(par=c(0.01, 0.01), lower=c(1e-9, 1e-9), fn=function(x) {-clust.likelihood(y, x[1], r0_hat)}, method="L-BFGS-B") k_hat <- t$par[[1]] ret$k_hat <- k_hat; ret$r0_hat <- r0_hat ret <- unlist(ret) return(ret) }

estimate parameters

param.est(Y)

code to generate new data

randomly "censor" 10% of values

cens.status <- ifelse(runif(10000)>0.9, 1, 0) Y.new <- cbind(Y,cens.status)

For reference, $P(Y=y)=\frac{\Gamma(ky+y-1)}{\Gamma(ky)\Gamma(y+1)}\frac{{\frac{R_0}{k}^{y-1}}}{(1+\frac{R_0}{k})^{-ky-y+1}}$

jpsmith
  • 329
  • 1
  • 12

0 Answers0