I was wondering how can we show that cross-entropy is equivalent to maximising log likelihood of a training assuming the data can be modeled by this distribution: 
Asked
Active
Viewed 66 times
0
-
1Does this answer your question? the relationship between maximizing the likelihood and minimizing the cross-entropy – Arya McCarthy Apr 14 '21 at 00:14
-
Also https://stats.stackexchange.com/questions/468818/machine-learning-negative-log-likelihood-vs-cross-entropy https://stats.stackexchange.com/questions/297749/how-meaningful-is-the-connection-between-mle-and-cross-entropy-in-deep-learning https://stats.stackexchange.com/questions/428937/mle-and-cross-entropy-for-conditional-probabilities – Arya McCarthy Apr 14 '21 at 00:15
-
I have seen that post but I am not sure if it still applies with the distribution provided for the training data above. Does it? – stgstu Apr 14 '21 at 00:22
-
1The top answer is precisely about the Bernoulli pmf you showed. – Arya McCarthy Apr 14 '21 at 00:24
-
1Please don't vandalize content on this website. If you wish to delete your question, you can do so by clicking the [delete] button below the question body. – Sycorax Apr 14 '21 at 00:29