I'm working on a problem which requires me to rank 3 products based on the preferences (i.e. how much a user likes them). Let's say products are A, B, and C
I have the ground truth: A > B > C i.e. ranking is [1, 2, 3]
I have a prediction algorithm which predicts the ranking, say [2, 3, 1].
Which metric should I use to evaluate the prediction algorithm?
mAP (Mean Average Precision) or mean Reciprocal Rank is not fit because these metrics are used in the context of information retrieval where the ground truth is present in the format of relevant or not relevant. However, in this case, the problem is to rank all 3 products, and all are relevant, but what matters to us is how accurately we can rank them.
Kendall's Tau is one approach. However, Kendall's Tau give p-value as 0.33 for evaluating a predicted ranking of [1,2,3] for the ground truth [1,2,3]. Is Kendall Tau statistically significant (and accepted in the statistic community) if used in this sense?
Any other metric suggestions would be helpful.