2

Suppose I have data points $(x_i,y_i)$, say $N$ points. I know they are supposed to fit the curve $y = f(x)$. Are there techniques more advanced than linear regression, for such cases to fit the curve? I am asking this because I can get the data points $(f(x_i), y)$ from $(x_i,y_i)$ and they fit on a line, so I can just use linear regression.

I don't know much statistics or probability.

Glorfindel
  • 1,118
  • 2
  • 12
  • 18
  • 4
    Linear regression is curve fitting. It's what you do when you expect or need the "curve" to be a straight line. I don't know if other techniques are "more advanced," but you can fit other curves (e.g., when you expect the function to be exponential, when you expect it to be a polynomial of degree N, ...) – Solomon Slow Sep 21 '22 at 15:55
  • 3
  • 1
    It depends on what you might mean by "linear regression" -- statistics understands that term in several senses. Could you clarify your meaning? – whuber Sep 21 '22 at 17:57
  • @Neeladri One important thing to consider is the effect of transformation on the error. Consider, for example, the difference between $Y=e^{\alpha+\beta x}+\epsilon$ (such as with an ordinary nonlinear least squares regression) and $\log(Y) = \alpha+\beta x+\eta$. If you transform both sides of the first by taking logs, you do not get the second (and vice versa). This impacts suitable estimators, and indeed the possibility of transforming at all; in the first equation, negative values of $Y$ are possible, but in the second, they are not; sometimes you would not be able to transform at all. – Glen_b Sep 22 '22 at 01:34
  • Thank you @Dave, that answer indirectly made me realize I asked a wrong question. I was thinking about functions such as y = A(a polynomial of x which all constants known) and only A is to be estimated. Then it can be obviously be reduced to a linear regression, but if more constants were undetermined I would have to use polynomial regression. My question makes no sense. – Neeladri Reddy Sep 22 '22 at 04:35
  • @Glen_b Could you please eloborate? – Neeladri Reddy Sep 22 '22 at 04:37
  • @Glen_b That’s all important and I think often overlooked by fans of transformations, but I’m not sure I see the relevance. (I’m also rather tired right now.) // Neeladri, I think your question is legitimate, though I do wonder if I answered it in the linked post. – Dave Sep 22 '22 at 04:38
  • @Dave and Neeladri I have written the beginnings of an answer that I hope responds to the original question and explains how my comment comes into it (by answering an obvious followup question that typically arises next). – Glen_b Sep 22 '22 at 07:44

2 Answers2

1

First, you should determine if there is an expected shape for $y(x)$ based on the particular data you are evaluating. Then, you should estimate the uncertainty of each of the y values, e.g., as standard deviations based on multiple measurements if possible. Then, plot $y = f(x)$ including uncertainty intervals about the y values. A valid curve fit should fit within the uncertainty intervals, minimize the least squares measure, and agree with the expected shape, if know the expected shape. Tools such as Mathematica offer curve fitting algorithms including a polynomial best fit one, called Fit.

1

Suppose I have data points $(x_i,y_i)$, say $N$ points. I know they are supposed to fit the curve $y = f(x)$. Are there techniques more advanced than linear regression, for such cases to fit the curve?

There are other techniques that linear regression, sure; indeed multiple techniques can arise depending on the form of the function, and how unknown parameters enter into $f$ as well as what is assumed about the noise term.

  1. There must be unknown parameters, or there's literally nothing to 'fit'. In simple linear regression it's the coefficients that x and 1 get multiplied by but in some non-linear curve $f$ there must still be unknown parameters somewhere.

  2. It's important to keep in mind that we don't expect the curve to go through the data values; typically we expect that the values are observed close to but not on the fitted curve; that they might have some distribution around the underlying curve, so that an observed $y_i$ at $x=x_i$ may be above or below the curve. This is sometimes thought of as noise but doesn't necessarily represent noise in the sense of instrument noise, for example; we can generalize it by treating the responses as being draws from some collection of distributions, one at each $x$ value. We might, for example, regard the fitted curve as being at the population mean of each conditional distribution.

Consequently, we would instead generally consider a curve that relates not the responses directly to some function of $x$ but instead something like $E(Y_i|x_i) = f(x_i,\underline{\theta})$; if we expect additive noise we might equivalently write $Y_i = f(x_i,\underline{\theta})+\epsilon_i$.

I am asking this because I can get the data points $(f(x_i), y)$ from $(x_i,y_i)$ and they fit on a line, so I can just use linear regression.

This will only work if the relationship

$E(Y_i|x_i) = f(x_i,\underline{\theta})$

is of the form

$E(Y_i|x_i) = \alpha+ \beta f(x_i)$

but few relationships can be written in this form.

Let's consider a concrete example!

This is an equation that was historically used in a particular context, but let's treat it somewhat abstractly for now:

$E(Y_i|x_i) = f(x_i,\theta) = e^{\alpha+\beta x_i}$

(i.e. here $\theta=(\alpha,\beta)'$)

For the moment, I'll leave aside specifying the distribution of $Y_i$ apart from its conditional mean; we will consider some possibilities.

Note that I can't just say "Let $Y_i^* = f(x_i)$ and then try to fit a straight line to the $(Y_i^*,x_i)$ values. The unknown parameters $\alpha$ and $\beta$ are inside $f$, not outside it.

[Nevertheless, there are some nice cases where transformation $-$ possibly with reparameterization $-$ can work. ... note to self; link to an example of this.]


This is where we get to my original comment under the question.

Consider a situation where $f$ is a strictly monotonic transformation of a linear function of $(1,x)$, that is, $f=g(\alpha+\beta x)$ for some strictly monotonic $g$, the natural reaction is to then try to transform not the $x$ (by $g$) but instead $Y$ by its inverse.

This seems to linearize the relationship as follows:

$Y = g(\alpha+\beta x)$ implies

$g^{-1}(Y) = \alpha+\beta x$

... and now this seems to be a candidate for linear regression.

Such a nonlinear relationship is a special case of what is sometimes called linearizable models (though some authors, like Bates and Watts, reserve that term for something else, and call this sort of model transformably linear).

More generally, transformably linear models may involve transforming both $Y$ and $x$ to attain a relationship that's linear in the parameters.

This is a highly attractive and commonly used idea.

The problem with this approach is that it ignores the "points don't lie on the curve" issue; in effect, it ignored the impact of the "noise" in the transformation.

That is, the seemingly attractive 'solution' to the nonlinearity of $f$ as a function of $\theta$ $-$ that of transforming $Y$ rather than $x$ (or more generally, transforming both) is caused by us making the mistake of writing $Y = f(x,\theta)$ as if it was observed without error.

We need to consider is the effect of transformation on the error.

Consider, for example, the difference between $Y=e^{α+βx}+ϵ$ (such as with an ordinary nonlinear least squares regression) and $\log(Y)=α+βx+η$. If you transform both sides of the first by taking logs, you do not get the second (and vice versa).

The presence of error $-$ unless it comes in in just the right way $-$ causes an issue.

This impacts suitable estimators. Indeed it may impact the possibility of transforming at all: in the first equation, negative values of $Y$ are possible, but in the second, they are not; if you had a negative response value caused by a large negative $\epsilon$ you would not be able to transform by taking logs at all.

Such a linearization approach (when it is present) does often have value is getting good approximate parameter values that may be used in estimating coefficents more precisely using iterative methods.

(I have to leave this answer for now; I hope to come back and briefly discuss different assumptions for the conditional distribution of $Y$ and how that might impact estimation methods.)

Glen_b
  • 282,281