I am trying to look at the interaction of many categorical variables. I am trying to gather P-values. I found code here that appears to be similar to what I am looking to do, but I can't seem to get the code to function correctly with my data.
I comb my data to first look at variables and make a list of every possible variable combination. But when I attempt to use the function and apply it to every variable combination, it throws me an error in the chisq.test portion of the code so that df_res is never created.
catOnly.test contains data like (all categorical except the last two).
Any suggestions?
V_NP_001 001 FU12Month V N 0 4 None 0.000 1.734
f = function(x,y) {
tbl = as.data.frame(catOnly.test)%>% select(x,y) %>% table()
chisq_pval = round(chisq.test(tbl)$p.value, 4)
cramV = round(cramersV(tbl), 4)
data.frame(x, y, chisq_pval, cramV) }
# create unique combinations of column names
# sorting will help getting a better plot (upper triangular)
df_comb = data.frame(t(combn(sort(names(catOnly.test)), 4)), stringsAsFactors = F)
# apply function to each variable combination
df_res = map2_df(df_comb$X1, df_comb$X2, f)
# plot results
df_res %>%
ggplot(aes(x,y,fill=chisq_pval))+
geom_tile()+
geom_text(aes(x,y,label=cramV))+
scale_fill_gradient(low="red", high="yellow")+
theme_classic()
print(df_res)