3

I have some data points and I fitted a function (2nd order polynomial here) to the data. The algorithm (scipy.optimize.curve_fit) gave me the optimal parameters and the covariance matrix of those parameters. In the documentation, it says that the square roots of the diagonal entries give the errors for the parameters.

Now I that I have the function $f$ fitted to points $0 < x$, I would like to find the value of $f(0)$ and its error. $f(0)$ is easy, but how do I acquire an estimate for the error?

  • The answer at http://stats.stackexchange.com/questions/9131/obtaining-a-formula-for-prediction-limits-in-a-linear-model gives a formula. – whuber Jul 01 '14 at 18:27

1 Answers1

0

I ended up using bootstrap, that gave me a decent looking error estimate. The procedure works a bit differently depending on whether one has errors or not.

Points without errors

Given are data points of the form $(x, y)$, and they are just randomly sampled from a population. For example, these could be age and body height. We would like to fit a model to that. Our fit algorithm takes the data points and a model function $y = f(x, \lambda)$ where $\lambda$ stands for all the model parameters. The optimization algorithm will tune $\lambda$ such that the model fits the points best.

Our sample of data points from the population is just a sample. We could have gotten a different sample. But we don't have the means to obtain another sample. This is where bootstrap comes in. We draw a sample from our sample $S$ by taking with replacement. When our sample size is $n$, we take $n$ elements with replacement to obtain a bootstrap sample $S^*$.

On that sample we also fit the model, and obtain another value for the model parameter, $\lambda^*$. We repeat this process $R$ times, where I usually choose $R$ to be around 500 to 1500. In the end we will have a whole distribution of $\lambda^*$ values. We can then take the standard deviation of the $\lambda^*$ distribution to obtain an error estimate for $\lambda$.

When I want to extrapolate to $x = 0$, I would use $y_0 = f(0, \lambda)$ to obtain the value $y_0$. In order to get the error estimate, I take every $\lambda^*$ that I have and compute $y_0^* = f(0, \lambda^*)$ to get a distribution of $y_0^*$. From these again I can take the standard deviation and get an error estimate.

Points with errors

If you have points with $(x, y, \Delta y)$, you can use parametric bootstrap to generate new samples from that. Using a normal distribution with mean $y$ and standard deviation $\Delta y$ for each data point, you can generate new $y^*$ from that implied distribution per data point. After that you proceed the same way.

Example

From my master's thesis I have this example where a polynomial is fitted to three data points with an error bar. Many fits have been done, and at each point I have obtained a distribution of model predictions. This way I could generate this band of uncertainty in the following graph:

enter image description here