2

These days I am looking for a good estimation for the mean and median difference confidence interval when I have categorical variables with more than two levels using the Kruskal test, Here Dr. Frank Harrell @FrankHarrell said it is possible using PO model, I went then to his book of biostatistics. He introduced there a general approach using the PO model, before using that, I did a quick test to compute the median difference confidence interval for one categorical variable with two levels and one numeric variable and compare it with results from <wilcox.test> function that is a special case of Kruskal test (Wilcox function gives the confidence interval but Kruskal function doesn't), and I obtained a big difference as you see below. What kind of mistake I did, please. and Thanks in advance.

rm(list = objects())

set.seed (1234)

similar to example on page 228 but for two levels

group = rep(c('A','B'), 100) y = rnorm (200 , 100 , 15) + 10*( group == 'B') require (rms) dd = datadist(group , y); options( datadist ='dd') f = orm(y ~ group) k = contrast (f, list ( group ='A'), list ( group ='B')) yquant = Quantile(f) ymed = function(lp) yquant (0.5 , lp=lp) Predict(f, group , fun=ymed)

the output was like this

group yhat lower upper 1 A 98.63239 95.24502 102.4621 2 B 107.70816 103.67949 110.8213

Response variable (y):

Limits are 0.95 confidence limits

using wilcox function in R

wilcox.test( y~group, conf.int = TRUE,paired = FALSE, exact = T, mu=0, correct=F)

The output was like this

Wilcoxon rank-sum exact test

data: y by group W = 3506, p-value = 0.0002345 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: -12.407601 -3.964255 sample estimates: difference in location -8.159511

Rani
  • 31
  • wilcox.test is using different (better) approach for CLs of differences: the Hodges-Lehmann estimator. This if for continuous Y (minimal ties) and is completely consistent with the WIlcoxon test. You'll have to run it in pairs since Kruskal-Wallis function doesn't do this. Which version of the rms package are you using? Also note that the rmsb package blrm function along with contrast and Quantile can provide exact (to within simulation error) Bayesian uncertainty intervals for a series of difference in means or quantiles using the proportional odds model. – Frank Harrell Jul 29 '21 at 11:49
  • I use version 6.2-0, I want to learn this approach to do the CI for difference mean and difference median, not only for OR. I saw that I think in your book bbr. Could you please post more details with R for more clarity. – Rani Jul 29 '21 at 13:13
  • The next release implements the delta method for getting better confidence intervals for means an quantiles. But that doesn't help with differences in means or quantiles. For now you'd need to put everything in a bootstrap loop to get bootstrap CLs, or use the Bayesian rms package rmsb. – Frank Harrell Jul 29 '21 at 13:53
  • aha, so no one does the difference CI f?
    I used your suggestion in the textbook to do the CI for difference two mean on two group:>>

    diffs = numeric(2000)

        for(i in 1 : 20000){
    
          diffs[i] = mean(sample(xsub1 , replace = TRUE )) - mean(sample(xsub2 , replace = TRUE ))
    
        }
    

    does this corerct?

    – Rani Jul 29 '21 at 14:01

1 Answers1

1

@Rani, as suggested by Prof Harrell, consider using rmsb to derive Bayesian 95% uncertainty interval for the between-group difference in median y. Below, I've provided codes to run the Bayesian Wilcoxon test on your example. Finally, a delightful treasure trove of information on the proportional odds model can be found here

library(rmsb)
d <- tibble (group =  rep(c('A','B'), 100),
             y  = rnorm (200 , 100 , 15) + 10*( group == 'B'))

mod_blrm <- blrm(y ~ group, keepsep=('group'), priorsd = c(1.5), ## specify a weakly informative skeptical prior data=d)

med_con <- rms::contrast(mod_blrm,
list(group ="B"), list(group ="A"), fun=function(lp, ...) Quantile(mod_blrm)(lp=lp,...) )

Posterior Summaries for First X Settings

Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD 1 109.1 109.1 105.1 112.5

Posterior Summaries for Second X Settings

Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD 1 98.49 98.18 95.3 101.5

Posterior Summaries of First - Second

Posterior Mean Posterior Median Lower 0.95 HPD Upper 0.95 HPD 1 10.58 10.71 5.218 15.54

visualize

plot(med_con, which='diff') + facet_grid(~"Group B vs Group A")
```

Yonghao
  • 93
  • 6
  • many thanks, it's great, but I got error: Error in blrm(y ~ group, keepsep = ("group"), priorsd = c(1.5), data = d) : could not find function "blrm". however I installed ,library(rmsb), library(ggplot2), library(tibble),

    also, do you think I can apply this for three groups?

    – Rani Sep 15 '21 at 21:02
  • @Rani please ensure that all package dependencies of rmsb are successfully installed. PO regression generalizes the Wilcoxon and Kruskal-Wallis tests so you simply (i) replace the binary group with the 3-level group variable and (ii) expand the priorsd argument to specify the priors for the various group coefficients. – Yonghao Sep 16 '21 at 22:01
  • Yes, it's work, but at least if I compare the result of your approach with the one obtained directly by the Wilcox function: wilcox.test( y~group, conf.int = TRUE, paired = FALSE, exact = T, mu=0, correct=F) I will get big difference CI (-12.407601, -3.964255), although you used non-informative prior. This is normal? – Rani Sep 20 '21 at 10:36