[There seem to be a lot of similar questions here, so please point me the right direction if this has already been answered, but I think it's reasonably differentiated.]
There are many different implementations of post-hoc analyses following a Kruskal-Wallis test. I'm trying to understand how (why?) they differ, to get a sense of when one might be the right choice over another.
Working in R, consider this simulated dataset
generate.sim.data<-function(seed){
set.seed(seed)
sim1<-rnorm(20,4,3)
sim2<-rnorm(20,7,3)
sim3<-rnorm(20,1,3)
sim4<-rnorm(20,1,3)
simdata<-c(sim1,sim2,sim3,sim4)
simgroup<-c(rep(c("sim1","sim2","sim3","sim4"),each=20))
data.frame(simdata,simgroup)
}
The functions kruskal in the package agricolae; kruskalmc in the package pgirmess, posthoc.kruskal.nemenyi.test in the package PMCMR, and dunn.test in the package dunn.test all give different statistics (for any input). For certain values, they also give varying results in pairwise comparisons
sim<-generate.sim.data(123)
kruskal(sim$simdata,sim$simgroup,console=T) #a,b,c,c
kruskalmc(sim$simdata,sim$simgroup) #a,a,b,b
posthoc.kruskal.nemenyi.test(sim$simdata,sim$simgroup) #a,a,b,b
dunn.test(sim$simdata,sim$simgroup) #a,b,c,c
but agree in some more clearcut cases:
sim<-generate.sim.data(321)
kruskal(sim$simdata,sim$simgroup,console=T) #a,a,b,b
kruskalmc(sim$simdata,sim$simgroup) #a,a,b,b
posthoc.kruskal.nemenyi.test(sim$simdata,sim$simgroup) #a,a,b,b
dunn.test(sim$simdata,sim$simgroup) #a,a,b,b
It seems that kruskalmc and posthoc.kruskal.nemenyi.test give similar results no matter what, and kruskal and dunn.test tend to give similar results, but this latter is not always the case, e.g.
sim<-generate.sim.data(4444)
kruskal(sim$simdata,sim$simgroup,console=T) #a,b,c,c
kruskalmc(sim$simdata,sim$simgroup) #ac,a,b,bc
posthoc.kruskal.nemenyi.test(sim$simdata,sim$simgroup) #ac,a,b,bc
dunn.test(sim$simdata,sim$simgroup) #ac,b,c,c
I realize I'm quibbling about some different behavior of the tests based on p-values very close to 0.05, but these tests also do give different diagnoses for real data sets (e.g., observation~method from the corn data in agricolae; occupation~eligibility from the homecare data in dunn.test). I wondered what the underlying differences are between the tests, and whether there's a reasonable criterion to choose one over another.