I have a multiple comparison problem between some thousands of correlation distributions and given shapes.
I have computed pair-wise correlation for a given set of gene expresion data, and I have generated correlation matrices with correlation distribution for each analyzed gene (range -1 to 1).
I want to compare the similarity between each correlation distribution and each of the given shapes I have generated.
A visual example:
I have the correlation distribution for gene X, which visually resembles the histogram of shape4, compared with shape2 and shape3.
I want to perform a test to get the probability of x to be equivalent to shape4 distribution. Say, the result would tell me that x follows a distribution that is more similar to shape4 than to shape2 or shape3, expressed as a p-val or probability (or both).
The range of the distributions goes from -1 to 1 (correlations).
I have also counted the number of ocurrences in 8 25percentile bins, and these 8 value vectors for each distribution also give me a visual inspection of the similarity, but I need a test to have a pval and/or probability. The ultimate goal is to assing each gene distribution to a given shape, based on the similarity among them.
I have checked a lot about different meassures of distance between distributions and chisq.test, ks.test ans so, but I would like to get a reasoned answer about what would be the best approach to this problem. I have also considered using some Machine Learning classifier.
Can someone provide a good solution (R based if possible)?
