0

Biology student here with a not so strong grasp of stats.

I have the following question: Is the number of genes meeting some criteria in Wild type (given the number of genes not meeting criteria in Wild type) significant when compared to the same in Mutant?

The contingency table is as follows:

                 Genes_Meeting_Criteria    Genes_not_meeting_criteria
Wild type                25                           450
Mutant.                  67                           4000

As I understand it, such a table is an 'unconditional' contingency table where sample sizes are not fixed in advance.

Can someone help me figure out what sort of test of significance can be applied to answer this question?

  • Why does the number of total genes (row sums) differ so greatly between Wild Type and Mutant? I would expect the total number of genes to be the same regardless of genotype and to depend only on the species. It's important to get that clarified. Whether you treat this as completely conditioned, conditioned on row sums, or unconditioned you will get a "statistically significant" result on this table, but it's important to understand just what the statistical test would be evaluating. – EdM Feb 13 '20 at 15:54

1 Answers1

0

If I understand correctly, you are asking whether the proportion of Genes meeting some criteria is similar between the wild and mutant type. In that case this is a simple proportion test (using R).

> prop.test(matrix(c(25,450,67,4000),nrow=2,byrow=T))

    2-sample test for equality of proportions with continuity correction

data:  matrix(c(25, 450, 67, 4000), nrow = 2, byrow = T)
X-squared = 26.227, df = 1, p-value = 3.035e-07
alternative hypothesis: two.sided
95 percent confidence interval:
 0.01452350 0.05779154
sample estimates:
    prop 1     prop 2 
0.05263158 0.01647406 

In short, the difference is significant.

user2974951
  • 7,813