Suppose I have a model that estimates a discrete probability distribution over a set of classes/factors c1, c2, c3.
What are the options and appropriate use cases of measuring performance given a test set of feature vectors and their labels?
Few options that come to mind
Maximum likelihood: measure probability of observing the class labels in the test set under the given model;
Assume that, for example, a test vector with a class label of
c1represents a collapsed probability distribution $(1, 0, 0)$. Then use a similarity measure (or a loss function) to compare these probability distributions with the predicted probability;Use a derived measure of expected utility $\mathbb{E}(U)$ based on application. Suppose further that we are modeling a game where the classes are possible outcomes of playing and there is an utility $U(c_i)$ associated with each class. Assume we would play whenever $\mathbb{E}(U) > 0$ in the model and our pay-off would be the utility of the actual class label or $0$ if we chose not to play. Then use pay-off over test set as a performance measure.