I'm currently trying to evaluate a model of metabolism which aims to predict whether deleting individual genes will cause a growth defect (there are ~850 genes in total). I know from experimental data which genes show slow growth, so I'm mainly judging the model on what percentage of genes it correctly predicts. There are only two possible predictions for each gene, normal growth or reduced growth.
To try and prove the model's effectiveness (or otherwise), I'd like to compare it to a null hypothesis of "genes predicted at random". However, I'm not sure what the best form of this would be, or even if it's a sensible question.
A couple of possibilities that occurred to me were:
The same number of growth defective genes are chosen, but they are assigned at random. (For example, if the model predicts 10 particular genes cause a growth defect if deleted but the rest show normal growth, the null hypothesis is that this is indistinguishable from picking ten genes at random)
The number of genes causing defects is chosen at random, then they are distributed randomly as above.
The first one seems to use too much information for a completely random prediction, but the second could show high significance even for poor predictions, so am not quite sure how to proceed...