I am trying to simulate "correlated categorical data". For instance, consider the following example: Suppose there are 10 players (p1, p2, ...p10) - each day, a random combination of these players meet together (1 = present, 0 = missing). These players then try to solve a puzzle (1 = successful, 0 = unsuccessful).
I would like this dataset to illustrate the following concepts:
I would like to show that when certain players are present together (e.g. p1 and p5, or p1,p6,p9), they tend to be more successful at solving the puzzle (i.e. the rows where they are there corresponds to the results column having a higher percentage)
I would like to show that when certain players are present together (e.g. p9 and p3), they tend to be less successful at solving the puzzle (i.e. the rows where they are there corresponds to the results column having a lower percentage)
I would like to show that certain players only tend to be successful when other players are there, and less successful when those players are absent (e.g. player 2 is very successful when player 3 is present, but player 2 is not very successful when player 3 is absent)
And finally, some players are always successful no matter who they play with
Using the R programming language, I tried to simulate some multivariate normal data - and then replace elements with a value of "0" if the elements was less than some number, and replace elements with a value of "1" if the elements were greater than some number:
library(mvtnorm)
n <- 11
A <- matrix(runif(n^2)2-1, ncol=n)
s <- t(A) %% A
my_data = MASS::mvrnorm(100, mu = c(rnorm(11,10,1)), Sigma = s)
my_data = data.frame(my_data)
colnames(my_data)[1] <- 'p1'
colnames(my_data)[2] <- 'p2'
colnames(my_data)[3] <- 'p3'
colnames(my_data)[4] <- 'p4'
colnames(my_data)[5] <- 'p5'
colnames(my_data)[6] <- 'p6'
colnames(my_data)[7] <- 'p7'
colnames(my_data)[8] <- 'p8'
colnames(my_data)[9] <- 'p9'
colnames(my_data)[10] <- 'p10'
colnames(my_data)[11] <- 'result'
my_data[my_data < 9] <- 0
my_data[my_data > 9] <- 1
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 result
1 1 1 1 0 1 1 1 0 0 0 0
2 0 1 1 1 1 0 1 1 1 1 1
3 1 1 1 0 1 0 1 1 1 1 1
4 1 1 1 0 1 1 1 1 1 1 1
5 0 1 1 1 1 0 1 1 0 0 0
6 1 0 1 0 1 1 1 0 1 1 1
But in the end, I not sure if I was able to successfully generate categorical data having any correlation pattern whatsoever.
Can someone please tell me if I have done this correctly - and if not, are there any standard methods to randomly simulate categorical correlated data?
Thanks!