6

I'm using python's scipy.optimize.curve_fit routine (which uses a non-linear least squares) to fit an exponential function of the form:

f(x) = a * exp(b*x) + c

to a set of data. The result looks like this:

enter image description here

where the black triangles are the data set and the blue curve is the f(x) fitted by the routine. The output of the process includes the optimal value for each parameter a, b, c along with a covariance matrix from where I can obtain the standard deviation for each of them, ie I get:

$a = p_a \pm e_a$

$b = p_b \pm e_b$

$c = p_c \pm e_c$

where $p_i$ are the optimal values and $e_i$ its standard deviations.

What I want is to generate the N-$\sigma$ curves (where N is an integer) for the fit. As an example, here's an article where the authors show a similar exponential best fit (although a decaying one) along with its $1\sigma$ upper and lower curves (see caption):

enter image description here

This site gives an example of how to do this but I'm not sure the procedure is entirely correct.

How can I correctly obtain such curves with the data I have available?

Glen_b
  • 282,281
Gabriel
  • 4,282

1 Answers1

5

One big problem I see with ordinary least squares is the variance is clearly not constant but increases as $x$ does; I'd be inclined to consider a model where spread is related to mean (possibly proportional to it), suggesting either a generalized nonlinear model or maybe the use of a transformation (such as a log-transform) to stabilize the variance.

Without dealing with that issue, your variance-covariance and standard error estimates are biased.

Since you want to have estimates that are functions of such quantities, addressing this problem will be central to obtaining a usable answer.


In response to comments:

The standard errors of the parameter estimates and the residual standard deviation are built on the assumption of constant variance. This assumption is clearly violated. Consider if (by eye) I draw vertical bars intended to cover about 99.something percent of the local spread at each of several different x-values:

enter image description here

If calculations are based off that equality assumption (as the usual standard errors are), then they will likely be of little value.

The authors in the paper that your second plot there comes from (a page reference would have been nice btw - I am not an astronomer, so I have little idea what's going on in the rest of it) appear to have used a fit where spread is not constant for the y-scale of their plot, and indeed seems to increase in a fashion consistent with their error bars, so it's not clear they necessarily have a problem.

The blog post you link to doesn't look like the plot from the paper; it appears to be using different assumptions. Things that look superficially similar may not be.

If the spread looks roughly constant on the log scale (and there are no actual 0's), then you could fit a nonlinear least squares model of the form $E(y) = \log(a e^{bx}+c)$, and then convert that to a fit on the unstransformed scale - but beware that expectations on the two scales don't directly correspond. If you have near normality on the log-scale you can still get an estimated expectation on the original scale. Failing that you might be left with either making a particular distributional assumption, or a Taylor approximation to the expectation.


Several kinds of intervals

There are several main kinds of intervals people may wish to place on a plot. Here I discuss three:

$\;$ 1. a pointwise (confidence) interval (based on conditional standard error of the mean). This is an interval for where the curve (the conditional mean) may be, for each x-value. That is, specify a particular $x$, and take a vertical slice there. We can produce an estimate of the standard error for the mean at that $x$; a repeated set of such two standard-error intervals* (from different data) at a given $x$ would be expected to include the population mean at that $x$ about 95% of the time if certain assumptions were satisfied.

This seems to be something like what is being attempted in the blog post you mention in your final link, but crucially, looking at the Python code they seem to be ignoring the effect of the parameter covariances, which would make the result wrong.

$\;$ *(We would not say "$2\sigma$" though, since $\sigma$ would conventionally be a population quantity, and this is a sample estimate of something)

$\;$ 2. An overall interval for the mean. This would still be an interval for the mean but one based on global coverage probabilities. (I don't think you're after this.)

$\;$ 3. A (pointwise) prediction interval. This would be an interval with a specified probability of including a new observation at each given $x$. It includes both parameter uncertainty (due to both variances and covariances of parameter estimates) and the observation "noise" (based on a sample estimate of what I'd call $\sigma$). This is what at first glance seems to be going on in the second plot, but I am not 100% certain.

You should clarify what kind of interval you need, and what exactly you mean by "$\sigma$".


Since you aren't really familiar with the usual standard errors and intervals in linear models, let's cover those. (The interval calculations below are all pointwise, not global.)

Standard errors and intervals in linear models

A. Standard error of mean, confidence interval for mean

For a linear model, $y=X\beta+\varepsilon$, $\text{Var}(\varepsilon)=\sigma^2I$, we have for a matrix $C$.

$\text{Var}(C\hat{\beta})=\sigma^2C(X'X)^{-1}C'$.

Hence, if $c$ is a row-vector of constants, $\text{se}(c\hat\beta)=\sigma\sqrt{c(X'X)^{-1}c'}$.

If you estimate $\hat{\sigma^2}$ by $s^2=\frac{1}{n-p}\sum_i e_i^2$ (where $p$ is the number of predictors including the intercept term), then $\frac{c\hat\beta-c\beta}{\text{se}(c\hat\beta)}$ is a pivotal quantity* with a t-distribution with $n-p$ d.f.

A $1-\alpha$ confidence interval for a mean (at $c=x^*$) is therefore $x* \hat\beta \pm t_{1-\alpha/2} s \sqrt{x^* (X'X)^{-1} {x^*}'}$, or more succinctly, $\hat{y} \pm t_{1-\alpha/2} s \sqrt{x^* (X'X)^{-1} {x^*}'}$.

* basically a pivotal quantity is a function of a parameter or parameters and data whose distribution doesn't depend on the parameter(s)

B. Process variation

This is just the estimated standard deviation of the process, $\sigma$, usually estimated by $s$. If we knew the true (conditional) mean at $x^*$, you'd expect about 68% of the observations within $\sigma$ of it, and about 95% within $2\sigma$ of it. The fact that $s$ is estimated alters that somewhat (but I won't labor the point by giving all the details here unless you need them, since this is usually not what people are after).

C. Standard error of prediction, prediction interval for an observation

This incorporates both of the previous two components. Let $y^*$ be an observation not in our sample.

Then the variance of $y^{*}-\hat{y}^{*}$ is $\sigma^2+\sigma^2x^*(X'X)^{-1} {x^*}' = \sigma^2(1+x^* (X'X)^{-1} {x^*}')$, and so the standard error is $\sigma$ $\sqrt{1+x^* (X'X)^{-1} {x^*}'}$

Extending (slightly) the notion of a pivotal quantity, in similar fashion to the confidence interval we can construct a $1-\alpha$ interval for $y^{*}$: $\hat{y}^*$ $\pm$ $t_{1-\alpha/2}$ $s$ $\sqrt{1+x^* (X'X)^{-1} {x^*}'}$


Similar (if approximate) intervals can be fairly easily constructed in nonlinear models; for example, via a Taylor expansion. If you're able to identify which (if any) of the above calculations you need, we may be able to explore how to approach that.

Glen_b
  • 282,281
  • I'm sorry but I don't understand what you are saying. I'm not a statistician so please bear with me: is what those authors published incorrect then? Is the simple technique in the link I presented at the end of my question not valid? Is this such a complicated process that I need to review even the automatic curve fit (which I thought was the easy part)? – Gabriel Jul 05 '14 at 01:31
  • I have extended my answer in response. – Glen_b Jul 05 '14 at 01:56
  • Gabriel - it's difficult for me to tell what's going on in several places in your description, in much of the paper and even to some extent in the blog post. Things I'd expect to be made explicit are either unstated or at least not obvious in the places I'd expect to look for them, so it's difficult to tell what corresponds to what across the three things. Can you clarify how the y-axis of the second plot relates to the y-axis of the first plot in your post? Posts should stand on their own, rather than require extensive off-site reading to understand, with the resulting chances of link-rot. – Glen_b Jul 05 '14 at 03:18
  • Thanks for extending the answer Glen (still don't understand much of it, but that's quite probably my fault) I thought the question was simple and obvious: I just want to know how to make an N-sigma curve starting from a curve fitted via a least square method. The y axis are completely unrelated. I pasted the 2nd image bc it shows what I want to perform on the 1st one: an N-sigma curve based on the fitted curve. The final link I pasted as an example and is not needed to understand the question, if you'd like I can remove it. Same with the link to the article from where the 2nd image was taken. – Gabriel Jul 05 '14 at 03:30
  • 1
    It's not necessary to remove links (they clarify the post, and that's good), as long as the post can be clearly understood even when those links die. Your comment above clarifies several things, thanks. Please don't construe lack of understanding as somehow your fault. It's not. You're clearly making earnest attempts to understand, but communication is often difficult even between people that know the same things, and there's (i) a lot of things you don't know, but I don't know you don't know, and (ii) a lot of information needed (on both our parts) to understand what's going on here. – Glen_b Jul 05 '14 at 03:36
  • Could you perhaps point me to a link/book/article/anything where I can check the method by which the authors of the 2nd image produced their N-sigma curves? That would be answer enough for me, I just want to understand a bit more even if it's not exactly applicable to my problem. – Gabriel Jul 05 '14 at 03:37
  • I have no clear idea what they did, so that's going to be difficult, but the outcome looks reasonable, so I have no reason to suspect a problem with it. Please note that they may have explained it somewhere, but I didn't see it, and I am not going to read the whole of a paper I can't comprehend (there's a limit to how much free time I'll donate for fake internet points) just to see if they tucked it away somewhere. If you know where they say what they did, could you point me to where it is? – Glen_b Jul 05 '14 at 03:39
  • No worries, no rush. It's not like I'm paying you :) – Gabriel Jul 05 '14 at 03:47
  • Oh and to answer your question above: no, they don't explain it anywhere.They just say (paragraph below the image) that they conducted a "best-fit analytical function ... which is an exponential of the form f(x) = a*exp(b*x)+c" and then they give the a,b,c fitted values along with their uncertainties. This led me to believe this is such a common statistical procedure it doesn't even need an explanation. – Gabriel Jul 05 '14 at 03:56
  • I have added a few paragraphs about common types of interval in statistics. To proceed further I'd at the least need to know what $\sigma$ means in your question. – Glen_b Jul 05 '14 at 04:08
  • Thanks you again for the details Glen. I personally believed that the $\sigma$ in my question meant a simple standard deviation and that it was an easy thing to obtain; I wasn't aware of the complications/ramifications of the process. Since my statistical background is non-existent I can't answer your question because I just don't know. Again I'm sorry. My end goal is to provide some sort of upper curve in my first image that somehow represents the uncertainty (sigma?) associated with the best fit curve obtained by the least-square function. – Gabriel Jul 05 '14 at 15:46
  • That's why I pasted the second image, because their "1-sigma" is the kind of curve I'd need but I have no idea how they came up with it. If the details I can give are not sufficient to answer the question, please let me know so I can accept your answer and look somewhere else for a solution (perhaps I'll send an email to the authors of the article and ask them what they did). – Gabriel Jul 05 '14 at 15:49
  • It looks like a prediction interval, but they may well be doing something else. Are you at all familiar with prediction intervals in linear regression? – Glen_b Jul 05 '14 at 15:59
  • No, I'm pretty (ie: almost completely) ignorant when it comes to statistical processes. I'm not afraid to learn though, so if you point me to some resources I can definitely try to see if I can figure out how to do it myself. – Gabriel Jul 05 '14 at 16:07
  • 1
    Glen: if I simply added the standard deviation to each parameter fitted (a,b,c) and plotted that curve along with the best fit one, would it mean something statistically speaking? – Gabriel Jul 05 '14 at 21:28
  • Thank you very much Glen, I'll await the details and see if I can (hopefully) learn something about this process. – Gabriel Jul 06 '14 at 03:09
  • Actually there was an error in my previous comment (I removed it). Some (potential) special cases aside, in general I don't think that the sums of parameter standard errors will be meaningful for nonlinear regression. I have modified my answer to explain three standard deviations or standard errors (ones that you might possibly be referring to as "sigma") and some associated intervals for the simple case of linear regression. If any of those strike you as the kind of thing you need, please let me know. I suspect that your second plot might be trying to do something like either B. or C. – Glen_b Jul 06 '14 at 11:03
  • 2
    While there is a limit to the "free time you'll donate for fake internet points", I thought this was a very insightful answer and you have shown exceptional patience, a genuine attempt at helping. I extended my SE account to stats.SE, just to upvote this (and the question too). Much like Gabriel, I am sometimes confronted with similar issues of providing some statistical measure of correctness to fittings to measurement data and wish I had more knowledge in that field. Thank you for the nomenclature and pointers - I'm off to Wikipedia to read-up on this. – Oliver W. Feb 14 '15 at 02:24