Questions tagged [cross-entropy]

A measure of the difference between two probability distributions for a given random variable or set of events.

In information theory, the cross-entropy between two probability distributions $p$ and $q$ over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution $q$, rather than the true distribution $p$.

The cross-entropy of the distribution $q$ relative to a distribution $p$ over a given set is defined as follows: $$ H(p,q)=-\mathbb{E}_p(\log q) $$ where $\mathbb{E}_p(\cdot)$ is the expected value operator with respect to the distribution $p$.

Source: Wikipedia.
Excerpt source: Brownlee "A Gentle Introduction to Cross-Entropy for Machine Learning" (2019).

262 questions
9
votes
4 answers

Normalized Cross Entropy

In this paper: http://quinonero.net/Publications/predicting-clicks-facebook.pdf, the authors introduce a metric called Normalized Cross Entropy (NCE): $$ \text{NE} = \frac{-\frac{1}{N} \sum_{i=1}^n(y_i\log(p_i) + (1-y_i)\log(1-p_i))}{-(p\log(p) +…
ved
  • 1,182
2
votes
1 answer

Meaning of non-{0,1} labels in binary cross entropy?

Binary cross entropy is normally used in situations where the "true" result or label is one of two values (hence "binary"), typically encoded as 0 and 1. However, the documentation for PyTorch's binary_cross_entropy function has the…
R.M.
  • 1,016
1
vote
1 answer

2 classfiers, the better one has a higher CrossEntropy loss values in during training

I have two classifiers (linear1 and linearGP). LinearGP has a better accuracy but it's CE loss has higher values in comparison with CE values of linear1. linearGP is learned by another loss. Data set is balanced. X axis represent samples during…
malocho
  • 316
  • 3
  • 10
1
vote
2 answers

Binary cross-entropy: plugging in probability 0

There is an answer on the Kaggle question board here by Dr. Fuzzy: You can assess a total miss-classification szenario by plugging zero-probs in the log-loss function (here sklearn log-loss): LL Count Class 3.31035117 15294 …
1
vote
0 answers

In multiclass classification with K classes does cross entropy loss need K outputs or K-1?

Hastie's "The Elements of Statistical Learning" textbook defines the probabilistic model of multiclass logistic regression with K classes as $\forall k \in \{1, \dots, K-1\} $ $$ \ln \frac{p(G=k \mid X=x)}{p(G=K \mid X=x)} = w_k^T x +…
CrabMan
  • 172
0
votes
1 answer

Cross entropy loss: inconsistency in formula

I have a couple of problems trying to understand the exact formula for cross entropy loss. Depending on the source I see it written different ways. Is the log() function $\log_2()$? Is the argument in the log: $q$ or $1/q$? I am fairly certain it…