I am trying to estimate a confidence interval using bootstrapping. As R data.frame my data looks like
library(data.table)
df <- data.table(compound= c(rep("ala", 5), rep("beta", 3), rep("phe", 8)),
obs = c(rep(FALSE, 7), rep(TRUE, 9)))
The statistic I am interested in is the percentage of TRUE values compared to the number of observations (9/16*100 = 56% for my example data). In my confidence interval I would like to account for the fact that my compounds were selected at random from a large number of compounds. Hence I would have intuitively done something like that (as written in R):
boot::boot.ci(boot::boot(data.frame(var = df$compound),
function(data, indices, stat_tab = df){
comp_samp <- data[indices,]
fin_tab <-
lapply(as.list(comp_samp), function(x, stat_tab_l = stat_tab ){
stat_tab_l[x == compound]
})
fin_tab <- rbindlist(fin_tab )
round(nrow(fin_tab[obs == TRUE])/nrow(fin_tab )*100,1)
},
R = 1000),
index=1,
type='basic')$basic
Is that a valid thing to do? I am a bit confused since my compounds can lead to different numbers of observations (rows in df) which means that in the different bootstrap samples I will have different numbers of observations when sampling by compound. In case it is not valid, why is that and is there a better way to estimate the CI in my scenario? Thank you
summary()function applied to the model object reports the variance for each random effect. For a simple model like this thecoef()function reports coefficients for each compound whileranef()reports the deviations around the overall intercept. It gets trickier with more complicated models; see this page for an example. – EdM Aug 06 '20 at 15:56coefvalues. My reluctance to say for sure has to do with my ignorance. Bootstrapping to get CI isn't as straightforward as it initially seems, and I'm not sure how appropriate such CI analysis is for random effects. – EdM Aug 06 '20 at 16:06