The following quote is from a set of lecture notes:
When fitting generalised linear models, the objective function is canonically the log - probability of $Y|X$ (essentially, the log - likelihood with data $Y|X$ and weight parameters W). Equivalently, we minimise $L(Y, \hat{Y}) := -\ln P(Y | \hat(Y(X))$ where the conditional distribution $p$ is assumed, and is parameterised by its mean $\hat{Y} = E[Y|X]$
How should I convince myself that $L(Y, \hat{Y}) := -\ln P(Y | \hat{Y}(X))$?