12

I used to say that OLS is an estimation technique and should never be confused with the type of model on which it is applied. Thus a phrase like "I have an OLS model" would not make sense to me, strictly speaking. (I would usually be able to guess what people mean, though.) However, in a comment under this post Dave offers a point to the contrary (if I am interpreting it correctly):

I’m not actually sold on OLS referring to an estimation technique rather than a model. Yes, we can estimate the coefficients of a linear model many ways and can apply minimization of square loss to estimating the coefficients of a nonlinear regression. However, the “ordinary” in OLS suggests to me an interest in a linear model whose coefficients are estimated by minimizing square loss.

Wikipedia's article on OLS seems to contain a similar message:

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one effects of a linear function of a set of explanatory variables) by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the input dataset and the output of the (linear) function of the independent variable.

(emphasis is mine)

So strictly speaking, does the use of OLS imply the model is linear in parameters? In other words, does the term OLS refer to an estimation technique and linearity of the model at once?

I would like the answers to focus on the theoretical aspects of the issue. The fact that a lot of people misuse statistical terms such as OLS is of less concern for me.

Richard Hardy
  • 67,272
  • Looking at the least-squares tag description, we only see reference to an estimation technique, not a model. However, the tag is LS, not OLS. On the other hand, a large part of the tag description actually applies to OLS but not necessarily to other versions of LS – without stating that explicitly. – Richard Hardy Dec 08 '22 at 14:51
  • Do you want to add a [tag:references] tag? I imagine an acceptable answer would refer to some commonly accepted gold standard reference. I wonder if there is an accepted explanation of what the "ordinary" stands for in OLS, and what "non-ordinary" least squares would be. – Stephan Kolassa Dec 08 '22 at 15:07
  • 1
    @StephanKolassa, the tag will not hurt; I have added it. Thanks. Regarding "non-ordinary", I suppose nonlinear LS is a good example. It could be used on a model that is not linear in parameters. – Richard Hardy Dec 08 '22 at 15:08
  • My two cents are that the answer to your question should still be no. Whenever you are given $(y,X)$ you can run OLS on that, whether the model you have in mind/that nature used to generate $(y,X)$ is linear in parameters or not (say, because nature used probit or logit). – Christoph Hanck Dec 08 '22 at 15:37
  • @ChristophHanck, would that not confuse the data generating process (DGP) with the model? Regardless of the DGP behind a sample of data, I may be fitting a model using OLS. Does that necessarily imply the model is linear in parameters? – Richard Hardy Dec 08 '22 at 15:41
  • OLS is actually not requiring so much that the residuals are normal distributed and is also a good method when the residual follow a different distribution. – Sextus Empiricus Dec 08 '22 at 15:53
  • @RichardHardy, maybe my comment gives rise to that risk, yes. So focussing on the model, that is something I have in my mind (or in my paper etc.) of how something works. Even if that model is nonlinear in the parameters, I can still fit OLS to the dataset. The classical example would be a LPM fitted to a binary outcome with predicted probabilities possibly outside the 0,1-range. The output/predictions of that OLS fitting exercise then will be something that is necessarily linear in the parameters, if that is maybe what we are after. – Christoph Hanck Dec 08 '22 at 15:55
  • @ChristophHanck, that clarifies it, thank you. – Richard Hardy Dec 08 '22 at 15:56
  • @SextusEmpiricus, right. Borrowing from Christoph Hanck's comment, OLS only requires data $(y,X)$, and that is enough for us to run OLS. What is required for OLS to have certain desirable statistical properties (finite sample or asymptotic) to justify calling it a good method is another question. – Richard Hardy Dec 08 '22 at 16:00
  • Reading the Wikipedia introductory paragraphs now and a decade ago, I am not certain this kind of discussion is actually helping people. – Henry Dec 09 '22 at 00:17
  • @Henry, good point. I only meant the Wikipedia quote as an illustration, not as an authoritative reference. – Richard Hardy Dec 09 '22 at 07:43
  • Among the many mistakes in statistics, sloppily calling OLS a model seems rather harmless. Nevertheless, I like the question. – Michael M Dec 09 '22 at 22:58
  • @MichaelM, for me it seems harmless when used among experts. But it can be a stumbling block for a beginner (it was for me). – Richard Hardy Dec 10 '22 at 09:08
  • I'm hung up on what you mean by "the model": are you referring to the true model? Or are you referring the specifications of the OLS? OLS is necessarily linear in the parameters. Whether you believe it or not is another question. – AdamO Dec 19 '22 at 20:39
  • @AdamO, I do not mean the true model (I would call it the data generating process instead; I think that is a more apt term; a true model sounds a bit like an oxymoron to me), I mean the model that we are using to model the data. – Richard Hardy Dec 20 '22 at 07:44

2 Answers2

11

Ordinary least squares regression is a special case of least squares regression.

With least squares regression we try to find a fit $\hat{y}_i({\bf{x}}_i,\boldsymbol{\beta})$ to datapoints $y_i$ by minimising the sum of (weighted) squared residuals.

$$\text{given data $\bf{x}_i$ and $y_i$, and weights $w_i$ find $\boldsymbol{\beta}$ that minimises:} \quad L = \sum_{i = 1}^n w_i [y_i-\hat{y}_i({\bf{x}}_i,\boldsymbol{\beta})]^2$$

OLS is the special case when the weights are equal $w_i = 1$ and the model is a linear combination

$$\hat{y}_i({\bf{x}}_i,\boldsymbol{\beta}) = \beta_1 f_1({\bf x}) + \beta_2 f_2({\bf x}) + \dots +\beta_p f_p({\bf x}). $$

OLS is by definition using a linear model.


But not all methods that use linear models are OLS. For instance think of GLM, quantile regression, lasso/ridge or Bayesian modelling, which can use a linear model but with a different cost function.

  • 1
    Thank you! I suppose that answers it. I will wait for any other views for a bit. Regarding "not all linear models are OLS", I have a hard time with the phrase. A model cannot be OLS, since OLS is an estimation technique (plus probably a model description). – Richard Hardy Dec 08 '22 at 15:07
  • 1
    Not to be the party pooper, but would a nonlinear model fitted to minimize the sum of squares also be "least squares", and if the observations were weighted, "weighted least squares", and thus if all the weights are equal... "ordinary least squares"? Or the other way around, if such a model is not "least squares", then what would it be? – Stephan Kolassa Dec 08 '22 at 15:12
  • @StephanKolassa would a nonlinear model fitted to minimize the sum of squares also be "least squares" yes that would be least squares, not? Sometimes written as non-linear least squares to be extra clear. – Sextus Empiricus Dec 08 '22 at 15:14
  • 2
    Btw. I agree with critism of the phrase "I ran an OLS model"; Strictly speaking you don't run a model. – Sextus Empiricus Dec 08 '22 at 15:20
  • Well, my main problem is with "OLS model", not "ran a model", because OLS is not a type of model / not a valid modifier for the object "model". Perhaps I should have used another example that excludes "running a model" to make my point more clearly. – Richard Hardy Dec 08 '22 at 15:23
  • 1
    Just for my own sanity about definitions: OLS is simply an estimation method (estimating distance from a prediction line and raw values, hence the "squares" they resemble), whereas a model is something that is theoretical ("I think x influences y")? I think people on this site are a lot more laser-focused on these things than I am used to. – Shawn Hemelstrand Dec 08 '22 at 15:37
  • @ShawnHemelstrand, a model could state the distribution of $Y|X$ such as $Y|X\sim N(\beta_0+\beta_1X,\sigma_\varepsilon^2)$ or $y_i=\beta_0+\beta_1x_i+\varepsilon_i$ combined with $\varepsilon_i\sim N(0,\sigma_\varepsilon^2)$. And my question is whether the term OLS bears any information about the model. Also note that a model does not have to agree or even be close to the data generating process (the true probabilistic relationship between $Y$ and $X$). – Richard Hardy Dec 08 '22 at 15:43
  • Okay I think that makes sense. Thanks for clarifying. – Shawn Hemelstrand Dec 08 '22 at 15:47
  • @RichardHardy I also agree with that. The phrase is actually correct for the part "I ran an OLS (regression)" and the mistake is in calling OLS a model. Yet, OLS is equivalent to MLE with a specific model, and possibly that is how OLS get's being abused as a reference for a particular model. – Sextus Empiricus Dec 08 '22 at 15:47
  • @Ben you can make that distinction, but it doesn't change that a working model is also a model. OLS is working with a linear model (or working model of you like), it contrasts with non-linear least squares. – Sextus Empiricus Dec 10 '22 at 23:51
0

By means of OLS one can estimate non linear relations provided they are purely additive or purely multiplicative (log-additive). For instance, quadratic relations like this: $$ y_t = a + bx_t + cx^2_t + e_t, $$ are perfectly fitted within the OLS framework and of course, their variants and extenions; but not like this: $$ y_t = a\log b^{2x_{1,t}}+ cx_{2,t}^{b}e_t. $$

The latter can arise in structural specifications where typically the parameters a,b,c is what you can get from data, so that reduced version models are simply ill designed to make good forecasts.

  • 2
    The post is explicitly about linearity in the parameters, so I do not see how your answer helps here. – Michael M Dec 09 '22 at 22:54
  • 2
    Welcome to Cross Validated! This is very often a good point to make, but as @MichaelM says, using "linearity in the parameters" shows the question asker already understands OLS can be used to fit your 1st model (the response $y$ is a linear function of the parameters $a$, $b$, & $c$) but not your 2nd model (the response $y$ is a non-linear function of the parameters $a$, $b$, & $c$). (My favourite example, by the way, is that you can fit a sine wave of known period using OLS.) – Scortchi - Reinstate Monica Dec 09 '22 at 23:50
  • Thanks for the point. If you want me to delete the note, tell me. But obviously, if the structural model is $$ y_t = b^2x_t $$ and you estimate $y_t = \beta x_t,$ you can obtain two solutions or none from the structural version. So the question always boils down to the relation of deep theory and data. – Samuel Dec 10 '22 at 23:37
  • Your second model is too ambiguous to serve as an example or counterexample. What is the argument of $\log$? Is the "$b$" in $x_{2,t}^b$ a power or an index? – whuber Dec 11 '22 at 00:11
  • I guess my initial point was a mix of a multiplicative and additive model, but there are innumerable alternatives. Thank you (corrected). – Samuel Dec 12 '22 at 01:09
  • It remains confusing, because the extremely similar model $y = a\log b^{(2x_1)}+c x_2 e = \beta x_1 + x_2 \varepsilon$ with $\beta = 2a\log b$ and $\varepsilon = ce$ is the simplest possible weighted least squares model. Thoughtful readers therefore might be wondering what point you are trying to make with this example. – whuber Dec 12 '22 at 14:50
  • If you see both models extremely similar, embed the exponent of the first summand with a sinh and tell me. – Samuel Dec 14 '22 at 02:48