I'm trying to compare different approaches to rank predictions. I have the ground truth distribution $P$ (discrete, zeta distribution) and two or more distributions ($Q, Q', Q'', Q'''$ in this case) I'd like to rank. (As in which one is better).
As I understand I can use the KL divergence for this.
What can I say then if one prediction has $D_{kl}(P || Q) = 3$ and an other one has $D_{kl}(P || Q') = 6$?
What can I say then if one prediction has $D_{kl}(P || Q'') = 0.3$ and an other one has $D_{kl}(P || Q''') = 0.6$?
I can certainly say that prediction $Q$ is better than prediction $Q'$, because it "seems" big difference, but what can I say about $Q''$ and $Q'''$ the difference seems small (I know $Q''$ is still better)?
My question is: Can I somehow numerically say that this prediction is better than that by "this amount"?
EDIT: $Q, Q', Q'', Q'''$ are all different in my question, I'm not confused about the asymmetry of $D_{kl}$, but in how to interpred different scales of results. I Edited the numbers to better reflect my question:
- Both examples give that one distribution is "twice as close" to $P$ (the truth), than the other, but the scales are different. How do I interpret that?