I am trying to get an intuitive understanding of the concept of calibration.
Definitions first. Consider a data distribution $P(X, Y)$ over binary labels $Y$, and a probabilistic classifier which returns the class prediction and the confidence estimate: $ h(X) = (\hat{Y}, \hat{P}) $. The confidence estimates are calibrated when:
$$ P(\hat{Y} = Y \mid \hat{P} = p) = p $$
What happens if $h$ is a (probabilistic) classifier which perfectly matches the true conditional $P(Y \mid X)$? Do we have perfect calibration then? Is it possible to have perfect calibration without $h$ perfectly matching $P(Y \mid X)$?