linearity in parameter assumption - implication for the Gauss-Markov theorem

Question

Even if a similar question has been asked many times, I have not been able to understand the consequences of non linearity in parameter in relation to the Gauss-Markov theorem.

In this question linearity assumption Regression, the answer seems to suggest that the B's would be biased (not sure, this is just my take, but I suspect that it is wrong) because, after applying a transformation that allows to express the model as linear in parameter, the b's would have two possible expected values, namely -B or +B. But I'm not sure of this interpretation, and I would like a more general statement not just an example.

Apart from that answer, I was trying to see it from the gauss-markov theorem perspective. My idea, so far, is that as long as I can express the B's as a linear combination of the y's and the b's are unbiased, then the parameters are BLUE. So I thought that non linearity in parameters implies that it is not possible to express the parameters of interest as a linear combination of the dependent variable (y's)

So the question is the following: linearity in parameters implies that it is possible to express the b's of interest as linear combination of the y's? And if so, how can I prove this statement?

As you can see, I'm very confused and whatever clarification on this aspect wuold be really appreciated!

A good first step to getting unconfused is to offer clear definitions of what you are asking about. Could you tell us what you mean by "linear in parameter" and "applying transformation"? I ask about these phrases in particular because the context suggests you might not be using them according to their standard meanings. If you would also clarify the distinction between a parameter and an estimate of a parameter in your question, that might help a lot, because you seem to confuse these two. — whuber, Mar 22 '21 at 22:13

score 0 · Accepted Answer · answered Mar 22 '21 at 23:43

A. Linear models

Let's say that $\mathbf{y}=(y_1,\dots,y_n)$ is a vector of observable random variables, and that each $y_i$ is accompanied by a vector $\mathbf{u}_i=(u_{i1},\dots,u_{im})$ of explanatory variables. The random vector $\mathbf{y}$ is modeled by specifying functions of $\mathbf{u}$, say $\delta(\cdot)$, and by assuming that the random deviations $y_i-\delta(\mathbf{u}_i)$ have a joint distribution with certain specified characteristics.
Now let $x_{ij}=\delta_j(\mathbf{u}_i)$, e.g. $x_{i1}=1$ ($\delta_1(\cdot)$ is a constant function), $x_{i2}=u_{i1}$, $x_{i3}=u_{i2}$, $x_{i4}=u_{i1}u_{i2}$, $x_{i5}=u_{i3}^2$, $x_{i6}=\sqrt{u_{i4}}$ etc.
In a linear model we express $\delta(\mathbf{u})$ in the form $$\delta(\mathbf{u})=\beta_1\delta_1(\mathbf{u})+\beta_2\delta_2(\mathbf{u})+\cdots+\beta_p\delta_p(\mathbf{u})=\mathbf{X}\boldsymbol{\beta}$$ where $\mathbf{X}=\{x_{ij}\}$.
That is, in a linear model we have a linear combination of functions of explanatory variables (even nonlinear functions, of course) and $\boldsymbol{\beta}$ is just a vector of coefficients. See Harville, Linear Models and the Relevant Distributions and Matrix Algebra, CRC Press, 2018, pages 123-125.

B. Least squares

Given the linear model $\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}$, you can estimate $\boldsymbol{\beta}$ by ordinary least squares: $$\hat{\boldsymbol{\beta}}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}\tag{1}$$ You may say that linearity in parameters, i.e. a linear model, implies that it is possible to express the $\hat{\boldsymbol{\beta}}$ as a linear function of $\mathbf{y}$, but not that whatever linear unbiased function is BLUE (see below).
You may say that non linearity in parameters implies that it is not possible to express their estimates as that linear function of the dependent variables, because ordinary least squares require linear models.

C. Gauss-Markov theorem

The Gauss-Markov theorem states that given a linear model, letting $\mathbf{y}-\delta(\mathbf{u})=\mathbf{y}-\mathbf{X}\boldsymbol{\beta}=\boldsymbol{\varepsilon}$, if $E(\boldsymbol{\varepsilon})=\mathbf{0}$, and $\text{var}(\boldsymbol{\varepsilon})=\sigma^2\mathbf{I}$, then the OLS estimator is the best linear unbiased estimator.
To keep it simple, we could say that a linear estimator has the form $\mathbf{Cy}$, where the matrix $\mathbf{C}$ is a function of $\mathbf{X}$. Hence a linear estimator is a linear function of the random vector $\mathbf{y}$. BTW, since a (linear) estimator is a (linear) function of a random vector, it is itself a random vector.
The theorem states that (1) is the best linear unbiased estimator, i.e. that (1) is better than whatever else linear unbiased function of $\mathbf{y}$. Other linear unbiased estimators (not parameters) are not BLUE. For example if $\mathbf{C}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'$ then $\hat{\boldsymbol{\beta}}=\mathbf{Cy}$ is BLUE, if $\tilde{\mathbf{C}}=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'+\mathbf{D}$ then $\tilde{\boldsymbol{\beta}}=\tilde{\mathbf{C}}\mathbf{y}$ is not BLUE even if it is unbiased.${}^1$
When you think of linearity in parameters, the random vector $\mathbf{y}$ is modeled as a linear combination of the $x$'s, i.e. of functions of the exaplanatory variables, and the parameters are just the (unknown, to be estimated) coefficients of that linear combination.

Linearity in parameters and linear estimators are related but very different matters.

${}^1$. $\hat{\boldsymbol{\beta}}$ is unbiased because $E[\hat{\boldsymbol{\beta}}]=E[(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}]=(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'E[\mathbf{y}]=(\mathbf{X}'\mathbf{X})^{-1}(\mathbf{X}'\mathbf{X})\boldsymbol{\beta}=\boldsymbol{\beta}$, i.e. because $\mathbf{CX}=\mathbf{I}$. $\tilde{\boldsymbol{\beta}}$ is unbiased if $\tilde{\mathbf{C}}\mathbf{X}=\mathbf{I}$.

linearity in parameter assumption - implication for the Gauss-Markov theorem

1 Answers1