1

After reading several posts on this site, I am still not sure on how to proceed. I have a set of genes, and have the proportion of mutated sites in each of them. This is an example in R:

myStats<- data.frame(
  SampleName = paste(
    'GeneName', 1:9, sep = ''),
  PropOfMutations = c(
    0.12210201, 0.13885505, 0.12861272,
    0.12845850, 0.14886364, 0.10860927,
    0.12933458, 0.08028169, 0.08295195))

I am using proportions as each gene has different length, and each of its nucleotides has a chance to mutate.

I want to test if any of the genes has a proportion of mutations higher than expected, as compared to the whole dataset. Would it be correct in this case to use the z-test:

z <- (p-p0)/sqrt(p0(1-p0)/n)

where n is the total no. of genes I have, and p0 the mean of all values? Please let me know if I need to add any additional details.

I have also checked the distribution of my data:

enter image description here

enter image description here

Max_IT
  • 111
  • If we abstractly name each site with a color and fancifully use the word "jelly bean" for "gene," you will appreciate that your situation is exactly the one described at https://stats.stackexchange.com/questions/88065. – whuber Dec 21 '22 at 21:57
  • Thank you, I would definitely need to correct for multiple testing here :) – Max_IT Dec 22 '22 at 14:13

0 Answers0