1

I have this data:

 structure(list(age = c(62.84998, 60.33899, 52.74698, 42.38498
 ), death = c(0, 1, 1, 1), sex = c("male", "female", "female", 
 "female"), hospdead = c(0, 1, 0, 0), slos = c(5, 4, 17, 3), d.time = c(2029, 
 4, 47, 133), dzgroup = c("Lung Cancer", "Cirrhosis", "Cirrhosis", 
 "Lung Cancer"), dzclass = c("Cancer", "COPD/CHF/Cirrhosis", "COPD/CHF/Cirrhosis", 
 "Cancer"), num.co = c(0, 2, 2, 2), edu = c(11, 12, 12, 11), income = c("$11-$25k", 
 "$11-$25k", "under $11k", "under $11k"), scoma = c(0, 44, 0, 
 0), charges = c(9715, 34496, 41094, 3075), totcst = c(NA_real_, 
 NA_real_, NA_real_, NA_real_), totmcst = c(NA_real_, NA_real_, 
 NA_real_, NA_real_), avtisst = c(7, 29, 13, 7), race = c("other", 
 "white", "white", "white"), sps = c(33.8984375, 52.6953125, 20.5, 
 20.0976562), aps = c(20, 74, 45, 19), surv2m = c(0.262939453, 
 0.0009999275, 0.790893555, 0.698974609), surv6m = c(0.0369949341, 
 0, 0.664916992, 0.411987305), hday = c(1, 3, 4, 1), diabetes = c(0, 
 0, 0, 0), dementia = c(0, 0, 0, 0), ca = c("metastatic", "no", 
 "no", "metastatic"), prg2m = c(0.5, 0, 0.75, 0.899999619), prg6m = c(0.25, 
 0, 0.5, 0.5), dnr = c("no dnr", NA, "no dnr", "no dnr"), dnrday = c(5, 
 NA, 17, 3), meanbp = c(97, 43, 70, 75), wblc = c(6, 17.0976562, 
 8.5, 9.09960938), hrt = c(69, 112, 88, 88), resp = c(22, 34, 
 28, 32), temp = c(36, 34.59375, 37.39844, 35), pafi = c(388, 
 98, 231.65625, NA), alb = c(1.7998047, NA, NA, NA), bili = c(0.19998169, 
 NA, 2.19970703, NA), crea = c(1.19995117, 5.5, 2, 0.79992676), 
     sod = c(141, 132, 134, 139), ph = c(7.459961, 7.25, 7.459961, 
     NA), glucose = c(NA_real_, NA_real_, NA_real_, NA_real_), 
     bun = c(NA_real_, NA_real_, NA_real_, NA_real_), urine = c(NA_real_, 
     NA_real_, NA_real_, NA_real_), adlp = c(7, NA, 1, 0), adls = c(7, 
     1, 0, 0), sfdm2 = c(NA, "<2 mo. follow-up", "<2 mo. follow-up", 
     "no(M2 and SIP pres)"), adlsc = c(7, 1, 0, 0)), row.names = c(NA, 
 4L), class = "data.frame")

I have also calculated the estimated population proportion of patients who had lung cancer as the primary disease group below.

SB_xlsx_mean = round(100 * mean(SB_xlsx$dzgroup == "Lung Cancer", na.rm = TRUE), 2)

SB_xlsx_mean

[1] 9.97

The population proportion with the main disease type of lung cancer was 0.0997 or 9.97%.

However, now need to calculate the 95% CI of the population proportion of patients who had lung cancer as the primary disease group. I've gotten 95% CIs before with t-tests, but I don't think that is really applicable here and I'm not sure how else to start.

barnsm2
  • 85

1 Answers1

1

Here is the example of using the binomial.test(). With only 4 values the confidence limits are huge.

binom.test(sum(df$dzgroup=="Lung Cancer"), n=nrow(df), p=0.5 )
Exact binomial test

data: sum(df$dzgroup == "Lung Cancer") and nrow(df) number of successes = 2, number of trials = 4, p-value = 1 alternative hypothesis: true probability of success is not equal to 0.5 95 percent confidence interval: 0.06758599 0.93241401 sample estimates: probability of success 0.5

The test above is assuming a probability of "Lung Cancer" at 50%, if you have a better estimate, substitute in an new value for 0.5 and the calculated p value will adjust.

Dave2e
  • 1,651