There is a lot going on here, but this point may be useful: log loss is a "proper loss" or "proper scoring rule".
Setup: suppose the classifier output $p$ is a probability distribution over $\mathcal{Y}$, the possible labels. That is, rather than guessing one of the labels, our model outputs a posterior distribution over them. Suppose the true Bayes-optimal distribution is $p'$. Then we want to use a loss $\ell(p,y)$ where setting $p=p'$ minimizes expected loss. This is the definition of "proper".
Log loss is proper:
$E_{y \sim p'} \ell(p,y) = - \sum_y p'(y) \log p(y), $
and you can check that the minimizing choice is $p=p'$. (The expected loss expression is known as cross-entropy.)
It would not be proper to, for example, use loss $\ell(p,y) = -p(y)$. In this case you can check that $\sum_y p'(y) p(y)$ is not minimized by $p=p'$, but instead by the delta distribution on the mode of $p'$.
In fact, log loss is the only proper loss of the form $\ell(p,y) = f(p(y))$, i.e. the only proper loss that only depends on the probability assigned to the observation $y$ and not to the rest of the probabilities.
Resource: https://stats.stackexchange.com/a/493949/70612