Calculating how close a model distribution is to the data distribution

Asked Jul 22 '23 at 15:55

Active Jul 22 '23 at 15:55

Viewed 29 times

A lot of machine learning models aim to approximate probability distributions. Let’s say P is the distribution of the data and Q is the distribution learned by our model. How do you measure how close Q is to P?

Please include sources/references if possible.

This question is from Chip Huyens ML interviews book

asked Jul 22 '23 at 15:55

randomvariable

1

It may depend on whether you are dealing with univariate or multivariate distributions and whether the random variables are continuous or discrete. Wikipedia has a list of statistical distances – Henry Jul 22 '23 at 16:44
1

and there are answers on this site such as https://stats.stackexchange.com/questions/425040/how-to-measure-the-statistical-distance-between-two-frequency-distributions and https://stats.stackexchange.com/questions/4044/measuring-the-distance-between-two-multivariate-distributions and https://stats.stackexchange.com/questions/78405/measuring-distance-between-two-empirical-distributions – Henry Jul 22 '23 at 16:44
https://stats.stackexchange.com/questions/76350/goodness-of-fit-for-continuous-variables might help as well. – jbowman Jul 22 '23 at 16:44

Calculating how close a model distribution is to the data distribution

0 Answers0