How to calculate a 95% Confidence Interval

Question

I have this data:

 structure(list(age = c(62.84998, 60.33899, 52.74698, 42.38498
 ), death = c(0, 1, 1, 1), sex = c("male", "female", "female", 
 "female"), hospdead = c(0, 1, 0, 0), slos = c(5, 4, 17, 3), d.time = c(2029, 
 4, 47, 133), dzgroup = c("Lung Cancer", "Cirrhosis", "Cirrhosis", 
 "Lung Cancer"), dzclass = c("Cancer", "COPD/CHF/Cirrhosis", "COPD/CHF/Cirrhosis", 
 "Cancer"), num.co = c(0, 2, 2, 2), edu = c(11, 12, 12, 11), income = c("$11-$25k", 
 "$11-$25k", "under $11k", "under $11k"), scoma = c(0, 44, 0, 
 0), charges = c(9715, 34496, 41094, 3075), totcst = c(NA_real_, 
 NA_real_, NA_real_, NA_real_), totmcst = c(NA_real_, NA_real_, 
 NA_real_, NA_real_), avtisst = c(7, 29, 13, 7), race = c("other", 
 "white", "white", "white"), sps = c(33.8984375, 52.6953125, 20.5, 
 20.0976562), aps = c(20, 74, 45, 19), surv2m = c(0.262939453, 
 0.0009999275, 0.790893555, 0.698974609), surv6m = c(0.0369949341, 
 0, 0.664916992, 0.411987305), hday = c(1, 3, 4, 1), diabetes = c(0, 
 0, 0, 0), dementia = c(0, 0, 0, 0), ca = c("metastatic", "no", 
 "no", "metastatic"), prg2m = c(0.5, 0, 0.75, 0.899999619), prg6m = c(0.25, 
 0, 0.5, 0.5), dnr = c("no dnr", NA, "no dnr", "no dnr"), dnrday = c(5, 
 NA, 17, 3), meanbp = c(97, 43, 70, 75), wblc = c(6, 17.0976562, 
 8.5, 9.09960938), hrt = c(69, 112, 88, 88), resp = c(22, 34, 
 28, 32), temp = c(36, 34.59375, 37.39844, 35), pafi = c(388, 
 98, 231.65625, NA), alb = c(1.7998047, NA, NA, NA), bili = c(0.19998169, 
 NA, 2.19970703, NA), crea = c(1.19995117, 5.5, 2, 0.79992676), 
     sod = c(141, 132, 134, 139), ph = c(7.459961, 7.25, 7.459961, 
     NA), glucose = c(NA_real_, NA_real_, NA_real_, NA_real_), 
     bun = c(NA_real_, NA_real_, NA_real_, NA_real_), urine = c(NA_real_, 
     NA_real_, NA_real_, NA_real_), adlp = c(7, NA, 1, 0), adls = c(7, 
     1, 0, 0), sfdm2 = c(NA, "<2 mo. follow-up", "<2 mo. follow-up", 
     "no(M2 and SIP pres)"), adlsc = c(7, 1, 0, 0)), row.names = c(NA, 
 4L), class = "data.frame")

I have also calculated the estimated population proportion of patients who had lung cancer as the primary disease group below.

SB_xlsx_mean = round(100 * mean(SB_xlsx$dzgroup == "Lung Cancer", na.rm = TRUE), 2)
SB_xlsx_mean
[1] 9.97

The population proportion with the main disease type of lung cancer was 0.0997 or 9.97%.

However, now need to calculate the 95% CI of the population proportion of patients who had lung cancer as the primary disease group. I've gotten 95% CIs before with t-tests, but I don't think that is really applicable here and I'm not sure how else to start.

You could do bootstrapping (repeatedly sampling with replacement, taking the mean proportion in each sample, then calculate the 2.5 and 97.5th percentile) — Bill O'Brien, Mar 23 '22 at 20:43
This Medium article, Five Confidence Intervals for Proportions That You Should Know About, describes five methods: Wald, Clopper—Pearson (also known as Exact), Wilson (also known as Score), Agresti-Coull, Bayesian HDP (highest posterior density) intervals. See also Confidence interval for Bernoulli sampling. — dipetkov, Jan 03 '23 at 17:17

score 1 · Answer 1 · answered Mar 26 '22 at 16:07

Here is the example of using the binomial.test(). With only 4 values the confidence limits are huge.

binom.test(sum(df$dzgroup=="Lung Cancer"), n=nrow(df), p=0.5 )
Exact binomial test


data:  sum(df$dzgroup == "Lung Cancer") and nrow(df)
number of successes = 2, number of trials = 4, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.06758599 0.93241401
sample estimates:
probability of success 
                   0.5

The test above is assuming a probability of "Lung Cancer" at 50%, if you have a better estimate, substitute in an new value for 0.5 and the calculated p value will adjust.

How to calculate a 95% Confidence Interval

[1] 9.97

1 Answers1