Extend likelihood equation to P(Y>=y) in R

Question

This question involves both math and coding in R. Apologies if this should be on Stack Overflow, but I decided statisticians would ultimately provide better support.

I have a simple likelihood that infers parameters $R_0$ and $k$ using a vector of integers that represents the sum of values ("clusters", denoted $y$). The original likelihood was simply:

(1) $L(R_0,k|y)=\prod_{i=1}^{\infty}P(Y=y)^{y_i}$

Some the integers represent values that are now 'censored' (i.e. truncated), and thus some of the observed data should be at least size $y$. I would like to include this in the likelihood, so that the likelihood of $R_0$ and $k$ when $y_a$ values have been fully observed and $y_b$ values have been censored is:

(2) $L(R,k|y_a,y_b)=\prod_yP(Y=y)^{y_a}P(Y≥y)^{y_b}$

I would like assistance in extending this likelihood function in r and ensure it is mathematically and computationally correct.

Originally, my data consisted of a single a vector of clusters and ran through a custom function (original code is at end, including code to generate sample data).

I am unsure how to extend this to include $P(Y≥y)$. My data are now in a two column matrix, with the first column the integer values, and the second column indicating censorship status (1=censored, 0=not censored).

I have tried modifying the function to split apart the Y column into either "full" values and "censored values". I assume $P(Y≥y)=1-\sum^{y-1}_{i=1}P(Y=y)$. One attempt was to create a separate calc_sizes for full cluster and censored clusters (denoted b), where censored clusters subtracted one from the cluster size (i.e. calc_sizes.b<-unique(c(1,b-1)) to account for the $\sum^{y-1}_{i=1}$ aspect, but I am unsure if this is correct, and where or how to include the $1-f(x)$ function.

This initially seemed like a simple task, but I am finding it very challenging. I have never attempted custom likelihoods before, so any advice would be much appreciated.

##### generate sample data 
##### data generating function
bp <- function(gens=20, init.size=1, offspring, ...){  
  Z <- list() #initiate the list
  Z[[1]] <- init.size # set the first position of the list as the 
                      # number of index cases
  i <- 1 
  while(sum(Z[[i]]) > 0 && i <= gens) { 
    Z[[i+1]] <- offspring(sum(Z[[i]]), ...) 
    i <- i+1 
  } 
  return(Z)
}
##### generate vector of 10000 with true R0=0.9 and k=0.25
set.seed(123)
Y <- unlist(lapply(replicate(n=10000, bp(offspring=rnbinom, mu=0.9, 
       size=0.25)), function(x) sum(unlist(x))))
Original function to calculate likelihoods (equation 1)####
clust.likelihood <- function(Y, k, r0){
  nb.likelihood <- function(x){
    lgamma(kx+(x-1))-(lgamma(kx) + lgamma(x+1))+(x-1)log(r0/k)- 
           (kx+(x-1))*log(1+r0/k)
    }
  calc_sizes <- unique(c(1, Y))
  likelihoods <- c()
  likelihoods[calc_sizes] <- sapply(calc_sizes, nb.likelihood)
    sexpl <- sum(exp(likelihoods), na.rm=TRUE)
    if (sexpl<1) {
      maxl <- log(1-sum(exp(likelihoods), na.rm=TRUE))
    } else {maxl <-  -Inf}
  likelihoods <- c(likelihoods, maxl)
  cluster_likelihoods <- likelihoods[Y]
    result <- sum(cluster_likelihoods)
  return(result)
}
function to estimate parameters####
param.est <- function(y){
  ret <- c()
  r0_hat <- 1-(1/mean(y)) 
    if (r0_hat < 1e-9) {r0_hat <- 1e-9}
  t <- optim(par=c(0.01, 0.01), lower=c(1e-9, 1e-9), 
           fn=function(x) {-clust.likelihood(y, x[1], 
            r0_hat)}, method="L-BFGS-B") 
  k_hat <- t$par[[1]]
  ret$k_hat <- k_hat; ret$r0_hat <- r0_hat
  ret <- unlist(ret)
  return(ret)
}
estimate parameters
param.est(Y)
code to generate new data
randomly "censor" 10% of values
cens.status <- ifelse(runif(10000)>0.9, 1, 0)
Y.new <- cbind(Y,cens.status)

For reference, $P(Y=y)=\frac{\Gamma(ky+y-1)}{\Gamma(ky)\Gamma(y+1)}\frac{{\frac{R_0}{k}^{y-1}}}{(1+\frac{R_0}{k})^{-ky-y+1}}$

Extend likelihood equation to P(Y>=y) in R

Original function to calculate likelihoods (equation 1)####

function to estimate parameters####

estimate parameters

code to generate new data

randomly "censor" 10% of values

0 Answers0