My dataset consists of 3 conditions, with different numbers of samples in each condition (30 in condition 1, 80 in condition 2, 50 in condition 3). I have measured the presence or absence of a gene in each sample. I would like to know if the presence rate of gene 1 is significantly different between conditions 1 and 2, and conditions 1 and 3. And then genes 2, 3, etc.
It seems a hypergeometric distribution is the best test to use but I don't know to apply the terms to my data.
Any help would be appreciated