I have what is probably a very naive question, but I've been unable to find a suitable explanation elsewhere. I am trying to calculate the likelihood of two errors occurring at the same position if they are randomly distributed amongst all positions available. The problem is difficult to explain in context, so I will simplify the scenario. Say I have two positions that can have a number of identities across a number of events. So I might represent this with two vectors:
X<-(1,1,0,0,0,0,0,0,0,0,1)
Y<-(1,1,0,0,0,0,0,0,0,0,0)
My assumption is that the 1's are errors and the zeros are not. What I want to calculate is the probability of the errors coinciding at corresponding positions in X and Y if the errors are distributed at random, and then how unlikely it is that I observe a certain number of corresponding errors (at the same position in X and Y) in observed data? To do this I have calculated the frequency of an error at each position (so in the e.g. above, 3/11 in X and 2/11 in Y) and then calculated the expected chance of overlap by multiplying these together (3/11*2/11=6/121). I can then calculate the probability of seeing at least two overlaping 1's (from observed data) using the binomial cumulatively -> 1-pbinom(1,11,6/121) = which is the chance of two or more observed overlaps in 11 trials given and expectation of 6/121 per trials.
However, does this fully capture the probability of observing this particular scenario (two or more matching errors) if the errors are randomly distributed, or am I missing some information in the fact that the 0's also match up frequently too? In the scenario I am working with, having zeros match up is also good evidence that the positions of 0's and 1s are not random, but as they are often way more frequent than 1's I don't know if I should be considering these matches.
Sorry if this question is unclear, but any help would be much appreciated!