I ended up using bootstrap, that gave me a decent looking error estimate. The procedure works a bit differently depending on whether one has errors or not.
Points without errors
Given are data points of the form $(x, y)$, and they are just randomly sampled from a population. For example, these could be age and body height. We would like to fit a model to that. Our fit algorithm takes the data points and a model function $y = f(x, \lambda)$ where $\lambda$ stands for all the model parameters. The optimization algorithm will tune $\lambda$ such that the model fits the points best.
Our sample of data points from the population is just a sample. We could have gotten a different sample. But we don't have the means to obtain another sample. This is where bootstrap comes in. We draw a sample from our sample $S$ by taking with replacement. When our sample size is $n$, we take $n$ elements with replacement to obtain a bootstrap sample $S^*$.
On that sample we also fit the model, and obtain another value for the model parameter, $\lambda^*$. We repeat this process $R$ times, where I usually choose $R$ to be around 500 to 1500. In the end we will have a whole distribution of $\lambda^*$ values. We can then take the standard deviation of the $\lambda^*$ distribution to obtain an error estimate for $\lambda$.
When I want to extrapolate to $x = 0$, I would use $y_0 = f(0, \lambda)$ to obtain the value $y_0$. In order to get the error estimate, I take every $\lambda^*$ that I have and compute $y_0^* = f(0, \lambda^*)$ to get a distribution of $y_0^*$. From these again I can take the standard deviation and get an error estimate.
Points with errors
If you have points with $(x, y, \Delta y)$, you can use parametric bootstrap to generate new samples from that. Using a normal distribution with mean $y$ and standard deviation $\Delta y$ for each data point, you can generate new $y^*$ from that implied distribution per data point. After that you proceed the same way.
Example
From my master's thesis I have this example where a polynomial is fitted to three data points with an error bar. Many fits have been done, and at each point I have obtained a distribution of model predictions. This way I could generate this band of uncertainty in the following graph:
