Optimal classification rule given data, model and loss function

Question

Setup
Suppose I have a data set with a categorical variable $Y$ (with possible values $j=1,\dots,J$) and another variable $X$. I wish to classify $Y$ based on the information in $X$.

For simplicity, suppose I also know the shape of the model that generated the data, though I do not know the parameter values; I will estimate them. E.g., this could be a (multinomial) logistic regression.

I am facing a loss function $l(\hat y,y)$ where $\hat y$ is the predicted class and $y$ is the actual/true class. $l$ can be represented by a matrix $\mathbf{L}$ where rows correspond to actual classes and columns to predicted ones. Each off-diagonal cell $l_{ij}$ ($i\neq j$) of $\mathbf{L}$ contains the loss associated with the specific misclassification.

Steps taken
I fit a (multinomial) logistic regression to the data. Given the fitted coefficients and a new data point with a known $X$ value $x_0$ but an unknown $Y$ value $y_0$, I obtain the fitted class probabilities $\hat p_0$ (a vector). I wish to classify $y_0$, i.e. obtain $\hat y_0$ so as to minimize the expected loss.

Questions
What is the optimal classification rule in this setting?
Could you also recommend a textbook chapter on the topic?

My problem is somewhat similar to Classification optimal decisions considering a loss function but my setting is frequentist and I do not have the prior distribution of classes (or class prevalence) available.

I know this is pretty basic, but I am struggling to find good textbook material on the topic. The few machine learning textbooks I looked at seem to stop short of optimal decision making (tailoring the decision to the loss function, given the rest). — Richard Hardy, Oct 02 '19 at 07:46

Richard Hardy · Answer 1 · 2019-10-02T08:40:51.277

0

I have considered the following decision rule:

Calculate the vector of "estimated expected loss" (denote it by $\widehat{el}(\mathbf{L},\hat p_0)$) consisting of elements corresponding to classification decisions $j=1,\dots,J$ (assigning class $j$ to $\hat y$) as $\widehat{el}(\mathbf{L},\hat p_0):=\mathbf{L}^\top \hat p_0$.
Find the minimal element of the vector and select the corresponding predicted class.

I wonder if this is optimal.

edited Oct 02 '19 at 08:40

answered Oct 02 '19 at 08:00

Richard Hardy

67,272

Since I do not know (multinomial) logistic regression well, I am not really sure what the fitted probabilities $\hat p_0$ are and whether they constitute a sensible estimate of true probabilities (hence the quotation marks for "estimated expected loss"). I have a gut feeling I might be ignoring the prevalence of each class (sort of the prior which I do not have) and perhaps implicitly assuming uniform class prevalence or something like that. – Richard Hardy Oct 02 '19 at 08:10

Optimal classification rule given data, model and loss function

1 Answers1

Linked