1

I have a model that transforms input data $X$ to output data $Y$ with some model parameters $p_1, .., p_n$. I simulate $n$ datasets from my model and for each dataset I reconstruct the parameters via model inversion and maximum likelihood estimation.

My question is: how can I meaningfully quantify the precision of the reconstruction for each parameter?

Computing the standard deviation across the $n$ estimates of each parameters seems kind of meaningless since it depends on the possibly arbitrary scaling of a parameter. I wonder if the standard deviation of reconstructed estimates of parameter $p_i$ has to be normalized by a quantity that measures how much influence a unit change of $p_i$ has on $Y$.

Does this make sense? And if yes, is there an established procedure to achieve this normalisation?

Note that I am interested in assessing the precision of reconstructing one specific value of a parameters, i.e. it is not an option to compute the correlation across a range of (true) parameter values.

monade
  • 509

1 Answers1

3

Here are some thoughts about your question:

The classical way to assess the quality of maximum likelihood estimators is indeed to:

  • generate $n$ independent and similar in size synthetic data sets from your model (parametrized with the ground truth parameters $p_1,\dots,p_m$);
  • compute maximum likelihood estimators for each of these data sets $({p}^{i}_1,\dots,p^{i}_m)_{1\leq i\leq n}$;
  • and finally to compute the mean (to check for biases) and the standard deviation (to check for accuracy) of the differences between your estimators and the ground truth values of the parameters.

You can see a nice example of application of this method in Fig.7 of the following paper, in which the authors use the Expectation-Maximization algorithm to infer the parameters of a model of synapse: https://www.frontiersin.org/articles/10.3389/fnsyn.2019.00022/full

This procedure is useful to study how the precision of your estimator varies with the value of your ground truth parameters, or with the size of your samples : as you mentioned, the result will be a function of the value of the parameters you used to generate your surrogate data.

But if you are looking for a way to quantify $\textit{a priori}$ (i.e. without running $n$ simulations) the expected accuracy of your estimator for a given model and parameters $p_1,\dots,p_m$, then what you are looking for is probably the Cramér-Rao bound (see the wikipedia article on the subject).

The Cramér-Rao bound gives you a lower bound on the variance of an unbiased estimator (a modification of the inequality for biased estimator also exists). The variance of your estimator will always be at least as large as the inverse of the Fisher Information, which is itself a function of the number of data points in your data sets and of the parameters of your model. The Fisher Information quantifies the expected curvature of your likelihood as a function of the parameters (see the properties of the Fisher Information). This precisely measures how much influence a unit change of $p_i$ has on $Y$.

Hope this helps !

Camille Gontier
  • 2,616
  • 6
  • 13
  • Thanks Camille Gontier, this helps! To clarify: I'm looking for precisely a way to quantify the precision of parameter reconstruction given $n$ simulations. Specifically, what I'm wondering is, how one can compare the reconstruction precision of different parameters. In the 'classical way', when one e.g. replaces one parameter $p_i$ in the model with $10*p_i$, the standard deviation will be ten times lower. To me this seems to make clear that one cannot compare the reconstruction precision of different parameters by means of their standard deviation, as their scaling might not be comparable. – monade Oct 11 '20 at 08:10
  • Thus, to me this seems to suggest that I have to normalize the standard deviations by a measure possibly such as the Fisher Information. What I'm looking for is the mathematically correct way to do such a normalisation (given that the above reasoning makes sense). – monade Oct 11 '20 at 08:12
  • I don't get why the standard deviation of the estimator would decrease if $p_i$ increases. If you want to estimate the mean $\mu$ of a normal distribution (with variance $\sigma^2$) from $n$ i.i.d. samples, the variance of your estimate $\hat{\mu}$ will be $\sigma^2 / n$, and does not scale with $\mu$. The relation between the variance of your estimator and the ground truth parameters is more complex than a mere scaling. – Camille Gontier Oct 11 '20 at 09:23
  • As an example, let's say I change a parameter that previously represented an angle in radian to instead represent it in degrees. This would correspond to a change in scaling and it would change the standard deviation of parameter estimates (in the example by a factor of $180°/\pi$). This is to exemplify that comparing the standard deviations of the estimates of different parameters is meaningless, as each standard deviation depends on the scaling of a parameter. We need some kind of normalisation to be able to compare the standard deviations. – monade Oct 11 '20 at 09:53
  • I.e., my point not about a change in mean (of a parameter), but about a change of scaling. – monade Oct 11 '20 at 10:01
  • Ok, thanks for the clarification. For this specific case, I would very naively suggest to use the coefficient of variation (i.e. the standard deviation normalized by the mean) of your estimators, instead of the standard deviation, to compare them : https://en.wikipedia.org/wiki/Coefficient_of_variation – Camille Gontier Oct 11 '20 at 12:52
  • But I'm not aware of any other post processing to be applied to your estimates, nor would I recommend to further post process them. It is normal and acceptable that the accuracy of your estimates will depend on your model, on its parametrization, and on your experimental protocol $X$. This is not "noise" or an artifact that you should get rid of, but a natural property of your maximum likelihood estimators. – Camille Gontier Oct 11 '20 at 12:56
  • Thanks, Camille Gontier, your input is much appreciated! Re Coefficient of variation: I think this will not work, as my true parameter value could be zero (and my estimates are unbiased, so their mean would then also be zero).

    -- Re post-processing, I think if one wants to compare the estimation precision of different parameters, one has no choice but to somehow post-process the estimates, for the reasons outlined. Otherwise one cannot make statements like "the estimation precision of parameter $p_1$ was greater than the estimation precision of parameter $p_2$".

    – monade Oct 11 '20 at 13:44
  • For instance, I thought about normalizing each parameters such that a unit change of the respective parameter corresponds to a unit change of the resulting log-likelihood (while holding all other parameters constant). I'm very very unsure though whether this would be mathematically sound and would really lead to a fair comparison between parameters. – monade Oct 11 '20 at 13:48