3

I am looking for methods/metrics to compare data matrices, which originate from the same dataset projected on two different feature spaces.

For background: these are DNA sequencing data projected on two different gene catalogues. The catalogues likely have a significant degree of overlap, but the exact correspondence between them is unknown, due to lack of a single established standard in the field. There are about a hundred samples and about a thousand features in each matrix (the number of features is not the same).

One approach that I have tried is clustering the samples and visually examining the dendrograms. This could be taken further by using one of the available metrics for comparing dendograms. I looking for alternative methods of quantitative comparison.

Roger V.
  • 3,903
  • 2
  • Could you say more specifically what kind of comparison you're interested in? What kind of output should it produce, and/or how do you want to use it? 2) Do you have both projections for each data point? Or are some points projected in one way and other points are projected in another? 3) What kind of features do the projections produce (e.g. real values, categorical, etc.)? Does the feature space have any special structure, distance metrics, etc.?
  • – user20160 Mar 19 '21 at 15:27
  • @user20160 These are great questions! In fact, I probably needed some help in answering some of them, since, once the question is correctly formulated, the answer is usually obvious. I realize, that it might be a bit too open-ended for SE... – Roger V. Mar 22 '21 at 10:20