I have a parameter $\theta$ and data $y = f(\theta) + \mathrm{noise}$. My goal is finding the best fit for $\theta$ and assess the uncertainty I have on this best fit. I see two competing approaches for doing this:
- I can compute the MLE $\widehat \theta =\arg \max_\theta p(y \mid \theta)$ with some optimization algorithm and assess the reliability of the estimator, using the Fisher information $I(\theta) =- \nabla^2_\theta p(y \mid \theta)$, and evaluating its inverse $I^{-1}(\widehat \theta)$ at the MLE. This would be a good option, since I can evaluate the Hessian analytically in my case, and finding the MLE is easy with some Newton algorithm.
- I can generate a sample $\{\theta^{(i)}\}_{i=1,\ldots, N}$ from $p(y \mid \theta)$, e.g., with MCMC sampling. I can then study the empirical covariance matrix of the sample to determine correlations, uncertainties, etc.
For a specific dataset and a eight-dimensional parameter, I get the following results:
There seems to be a pattern here: the two matrices seem to be a multiple of each other. I struggle to understand to which extent this is true, and how the two approaches above are connected / distinct.

