A problem on estimability of parameters

Question

Let $Y_1,Y_2,Y_3$ and $Y_4$ be four random variables such that $E(Y_1)=\theta_1-\theta_3;\space\space E(Y_2)=\theta_1+\theta_2-\theta_3;\space\space E(Y_3)=\theta_1-\theta_3;\space\space E(Y_4)=\theta_1-\theta_2-\theta_3$, where $\theta_1,\theta_2,\theta_3$ are unknown parameters. Also assume that $Var(Y_i)=\sigma^2$, $i=1,2,3,4.$ Then which one is true?
A. $\theta_1,\theta_2,\theta_3$ are estimable.
B. $\theta_1+\theta_3$ is estimable.

C. $\theta_1-\theta_3$ is estimable and $\dfrac{1}{2}(Y_1+Y_3)$ is the best linear unbiased estimate of $\theta_1-\theta_3$.

D. $\theta_2$ is estimable.

The answer is given is C which looks strange to me (because I got D).

Why I got D? Since, $E(Y_2-Y_4)=2\theta_2$.

Why I don't understand that C could be an answer? Ok, I can see, $\dfrac{Y_1+Y_2+Y_3+Y_4}{4}$ is an unbiased estimator of $\theta_1-\theta_3$, and its' variance is less than $\dfrac{Y_1+Y_3}{2}$.

Please tell me where am I doing wrong.

Also posted here: https://math.stackexchange.com/questions/2568894/a-problem-on-estimability-of-parameters

Put in self-study tag or someone will come along and close your question. — Carl, Dec 16 '17 at 12:27
@Carl nothing like that written here. (Obviously $E(Y_1)=E(Y_3)$) — Stat_prob_001, Dec 16 '17 at 14:30
Problem is unclear but could be that they have different data values. — Carl, Dec 16 '17 at 14:40
@Carl you can think in this way: $Y_1=\theta_1-\theta_3+\epsilon_1$ where $\epsilon_1$ is a rv with mean $0$ and variance $\sigma^2$. And, $Y_3=\theta_1-\theta_3+\epsilon_3$ where $\epsilon_3$ is a rv with mean $0$ and variance $\sigma^2$ — Stat_prob_001, Dec 16 '17 at 14:46
I think you're right, actually. (By the way, as far as I know, we don't close questions just for missing a tag; though we do close a lot of questions that are just bookwork with no attempt at an answer or indication of what exactly's puzzling.) — Scortchi - Reinstate Monica, Dec 16 '17 at 15:48
Is the context of the Chapter in which this question appears about Normal distributions, perchance? — Carl, Dec 16 '17 at 16:17
@Scortchi are you saying D is right answer? (Assuming $Y_i$'s are uncorrelated) — Stat_prob_001, Dec 16 '17 at 16:30
@Stat_prob_001: Yes. (And even if you don't assume the $Y_i$s are uncorrelated, that would still be a special case scuppering C.) — Scortchi - Reinstate Monica, Dec 16 '17 at 17:13

Zhanxiong · Accepted Answer · 2023-12-17T20:04:20.080

This answer stresses the verification of estimability. The minimum variance property is of my secondary consideration.

To begin with, summarize the information in terms of matrix form of a linear model as follows: \begin{align} Y := \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{bmatrix} = \begin{bmatrix} 1 & 0 & -1 \\ 1 & 1 & -1 \\ 1 & 0 & -1 \\ 1 & -1 & -1 \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \\ \theta_3 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \end{bmatrix}:= X\beta + \varepsilon, \tag{1} \end{align} where $E(\varepsilon) = 0, \text{Var}(\varepsilon) = \sigma^2 I$ (to discuss estimability, the spherity assumption is not necessary. But to discuss the Gauss-Markov property, we do need to assume the spherity of $\varepsilon$).

If the design matrix $X$ is of full rank, then the orginal parameter $\beta$ admits a unique least-squares estimate $\hat{\beta} = (X'X)^{-1}X'Y$. Consequently, any parameter $\phi$, defined as a linear function $\phi(\beta)$ of $\beta$ is estimable in the sense that it can be unambiguously estimated by data via the least-squares estimate $\hat{\beta}$ as $\hat{\phi} = p'\hat{\beta}$.

The subtlety arises when $X$ is not of full rank. To have a thorough discussion, we fix some notations and terms first below (I follow the convention of The Coordinate-free Approach to Linear Models, Section 4.8. Some of the terms sound unnecessarily technical). In addition, the discussion applies to the general linear model $Y = X\beta + \varepsilon$ with $X \in \mathbb{R}^{n \times k}$ and $\beta \in \mathbb{R}^k$.

A regression manifold is the collection of mean vectors as $\beta$ varies over $\mathbb{R}^k$: $$M = \{X\beta: \beta \in \mathbb{R}^k\}.$$

A parametric functional $\phi = \phi(\beta)$ is a linear functional of $\beta$, $$\phi(\beta) = p'\beta = p_1\beta_1 + \cdots + p_k\beta_k.$$

As mentioned above, when $\text{rank}(X) < k$, not every parametric functional $\phi(\beta)$ is estimable. But, wait, what is the definition of the term estimable technically? It seems difficult to give a clear definition without bothering a little linear algebra. One definition, which I think is the most intuitive, is as follows (from the same aforementioned reference):

Definition 1. A parametric functional $\phi(\beta)$ is estimable if it is uniquely determined by $X\beta$ in the sense that $\phi(\beta_1) = \phi(\beta_2)$ whenever $\beta_1,\beta_2 \in \mathbb{R}^k$ satisfy $X\beta_1 = X\beta_2$.

Interpretation. The above definition stipulates that the mapping from the regression manifold $M$ to the parameter space of $\phi$ must be one-to-one, which is guaranteed when $\text{rank}(X) = k$ (i.e., when $X$ itself is one-to-one). When $\text{rank}(X) < k$, we know that there exist $\beta_1 \neq \beta_2$ such that $X\beta_1 = X\beta_2$. The estimable definition above in effect rules out those structural-deficient parametric functionals that result in different values themselves even with the same value on $M$, which don't make sense naturally. On the other hand, an estimable parametric functional $\phi(\cdot)$ does allow the case $\phi(\beta_1) = \phi(\beta_2)$ with $\beta_1 \neq \beta_2$, as long as the condition $X\beta_1 = X\beta_2$ is fulfilled.

There are other equivalent conditions to check the estimability of a parametric functional given in the same reference, Proposition 8.4.

After such a verbose background introduction, let's come back to your question.

A. $\beta$ itself is non-estimable for the reason that $\text{rank}(X) < 3$, which entails $X\beta_1 = X\beta_2$ with $\beta_1 \neq \beta_2$. Although the above definition is given for scalar functionals, it is easily generalized to vector-valued functionals.

B. $\phi_1(\beta) = \theta_1 + \theta_3 = (1, 0, 1)'\beta$ is non-estimable. To wit, consider $\beta_1 = (0, 1, 0)'$ and $\beta_2 = (1, 1, 1)'$, which gives $X\beta_1 = X\beta_2$ but $\phi_1(\beta_1) = 0 + 0 = 0 \neq \phi_1(\beta_2) = 1 + 1 = 2$.

C. $\phi_2(\beta) = \theta_1 - \theta_3 = (1, 0, -1)'\beta$ is estimable. Because $X\beta_1 = X\beta_2$ trivially implies $\theta_1^{(1)} - \theta_3^{(1)} = \theta_1^{(2)} - \theta_3^{(2)}$, i.e., $\phi_2(\beta_1) = \phi_2(\beta_2)$.

D. $\phi_3(\beta) = \theta_2 = (0, 1, 0)'\beta$ is also estimable. The derivation from $X\beta_1 = X\beta_2$ to $\phi_3(\beta_1) = \phi_3(\beta_2)$ is also trivial.

After the estimability is verified, there is a theorem (Proposition 8.16, same reference) claims the Gauss-Markov property of $\phi(\beta)$. Based on that theorem, the second part of option C is incorrect. The best linear unbiased estimate is $\bar{Y} = (Y_1 + Y_2 + Y_3 + Y_4)/4$, by the theorem below.

Theorem. Let $\phi(\beta) = p'\beta$ be an estimable parametric functional, then its best linear unbiased estimate (aka, Gauss-Markov estimate) is $\phi(\hat{\beta})$ for any solution $\hat{\beta}$ to the normal equations $X'X\hat{\beta} = X'Y$.

The proof goes as follows:

Proof. Straightforward calculation shows that the normal equations is \begin{equation} \begin{bmatrix} 4 & 0 & -4 \\ 0 & 2 & 0 \\ -4 & 0 & 4 \end{bmatrix} \hat{\beta} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & -1 \\ -1 & -1 & -1 & -1 \end{bmatrix} Y, \end{equation} which, after simplification, is \begin{equation} \begin{bmatrix} \phi(\hat{\beta}) \\ \hat{\theta}_2/2 \\ -\phi(\hat{\beta}) \end{bmatrix} = \begin{bmatrix} \bar{Y} \\ (Y_2 - Y_4)/4 \\ -\bar{Y} \end{bmatrix}, \end{equation} i.e., $\phi(\hat{\beta}) = \bar{Y}$.

Therefore, option D is the only correct answer.

Addendum: The connection of estimability and identifiability

When I was at school, a professor briefly mentioned that the estimability of the parametric functional $\phi$ corresponds to the model identifiability. I took this claim for granted then. However, the equivalance needs to be spelled out more explicitly.

According to A.C. Davison's monograph Statistical Models p.144,

Definition 2. A parametric model in which each parameter $\theta$ generates a different distribution is called identifiable.

For linear model $(1)$, regardless the spherity condition $\text{Var}(\varepsilon) = \sigma^2 I$, it can be reformulated as \begin{equation} E[Y] = X\beta, \quad \beta \in \mathbb{R}^k. \tag{2} \end{equation}

It is such a simple model that we only specified the first moment form of the response vector $Y$. When $\text{rank}(X) = k$, model $(2)$ is identifiable since $\beta_1 \neq \beta_2$ implies $X\beta_1 \neq X\beta_2$ (the word "distribution" in the original definition, naturally reduces to "mean" under model $(2)$.).

Now suppose that $\text{rank}(X) < k$ and a given parametric functional $\phi(\beta) = p'\beta$, how do we reconcile Definition 1 and Definition 2?

Well, by manipulating notations and words, we can show that (the "proof" is rather trivial) the estimability of $\phi(\beta)$ is equivalent to that the model $(2)$ is identifiable when it is parametrized with parameter $\phi = \phi(\beta) = p'\beta$ (the design matrix $X$ is likely to change accordingly). To prove, suppose $\phi(\beta)$ is estimable so that $X\beta_1 = X\beta_2$ implies $p'\beta_1 = p'\beta_2$, by definition, this is $\phi_1 = \phi_2$, hence model $(2)$ is identifiable when indexing with $\phi$. Conversely, suppose model $(2)$ is identifiable so that $X\beta_1 = X\beta_2$ implies $\phi_1 = \phi_2$, which is trivially $\phi_1(\beta) = \phi_2(\beta)$.

Intuitively, when $X$ is reduced-ranked, the model with $\beta$ is parameter redundant (too many parameters) hence a non-redundant lower-dimensional reparametrization (which could consist of a collection of linear functionals) is possible. When is such new representation possible? The key is estimability.

To illustrate the above statements, let's reconsider your example. We have verified parametric functionals $\phi_2(\beta) = \theta_1 - \theta_3$ and $\phi_3(\beta) = \theta_2$ are estimable. Therefore, we can rewrite the model $(1)$ in terms of the reparametrized parameter $(\phi_2, \phi_3)'$ as follows \begin{equation} E[Y] = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 0 \\ 1 & - 1 \end{bmatrix} \begin{bmatrix} \phi_2 \\ \phi_3 \end{bmatrix} = \tilde{X}\gamma. \end{equation}

Clearly, since $\tilde{X}$ is full-ranked, the model with the new parameter $\gamma$ is identifiable.

If you need a proof for the second part of option C, I will supplement my answer. — Zhanxiong, Dec 16 '17 at 16:50
thanks! for such a detailed answer. Now, about the second part of C: I know that "best" relates to minimum variance. So, why not $\dfrac{1}{4}(Y_1+Y_2+Y_3+Y_4)$ is not "best"? — Stat_prob_001, Dec 16 '17 at 17:01
Oh, I don't know why I thought it is the estimator in C. Actually $(Y_1 + Y_2 + Y_3 + Y_4)/4$ is the best estimator. Will edit my answer — Zhanxiong, Dec 16 '17 at 17:11

whuber · Answer 2 · 2017-12-16T17:59:35.347

Apply the definitions.

I will provide details to demonstrate how you can use elementary techniques: you don't need to know any special theorems about estimation, nor will it be necessary to assume anything about the (marginal) distributions of the $Y_i$. We will need to supply one missing assumption about the moments of their joint distribution.

Definitions

All linear estimates are of the form $$t_\lambda(Y) = \sum_{i=1}^4 \lambda_i Y_i$$ for constants $\lambda = (\lambda_i)$.

An estimator of $\theta_1-\theta_3$ is unbiased if and only if its expectation is $\theta_1-\theta_3$. By linearity of expectation,

$$\eqalign{ \theta_1 - \theta_3 &= E[t_\lambda(Y)] = \sum_{i=1}^4 \lambda_i E[Y_i]\\ & = \lambda_1(\theta_1-\theta_3) + \lambda_2(\theta_1+\theta_2-\theta_3) + \lambda_3(\theta_1-\theta_3) + \lambda_4(\theta_1-\theta_2-\theta_3) \\ &=(\lambda_1+\lambda_2+\lambda_3+\lambda_4)(\theta_1-\theta_3) + (\lambda_2-\lambda_4)\theta_2. }$$

Comparing coefficients of the unknown quantities $\theta_i$ reveals $$\lambda_2-\lambda_4=0\text{ and }\lambda_1+\lambda_2+\lambda_3+\lambda_4=1.\tag{1}$$

In the context of linear unbiased estimation, "best" always means with least variance. The variance of $t_\lambda$ is

$$\operatorname{Var}(t_\lambda) = \sum_{i=1}^4 \lambda_i^2 \operatorname{Var}(Y_i) + \sum_{i\ne j}^4 \lambda_i\lambda_j \operatorname{Cov}(Y_i,Y_j).$$

The only way to make progress is to add an assumption about the covariances: most likely, the question intended to stipulate they are all zero. (This does not imply the $Y_i$ are independent. Furthermore, the problem can be solved by making any assumption that stipulates those covariances up to a common multiplicative constant. The solution depends on the covariance structure.)

Since $\operatorname{Var}(Y_i)=\sigma^2,$ we obtain

$$\operatorname{Var}(t_\lambda) =\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2).\tag{2}$$

The problem therefore is to minimize $(2)$ subject to constraints $(1)$.

Solution

The constraints $(1)$ permit us to express all the $\lambda_i$ in terms of just two linear combinations of them. Let $u=\lambda_1-\lambda_3$ and $v=\lambda_1+\lambda_3$ (which are linearly independent). These determine $\lambda_1$ and $\lambda_3$ while the constraints determine $\lambda_2$ and $\lambda_4$. All we have to do is minimize $(2)$, which can be written

$$\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2) = \frac{\sigma^2}{4}\left(2u^2 + (2v-1)^2 + 1\right).$$

No constraints apply to $(u,v)$. Assume $\sigma^2 \ne 0$ (so that the variables aren't just constants). Since $u^2$ and $(2v-1)^2$ are smallest only when $u=2v-1=0$, it is now obvious that the unique solution is

$$\lambda = (\lambda_1,\lambda_2,\lambda_3,\lambda_4) = (1/4,1/4,1/4,1/4).$$

Option (C) is false because it does not give the best unbiased linear estimator. Option (D), although it doesn't give full information, nevertheless is correct, because

$$\theta_2 = E[t_{(0,1/2,0,-1/2)}(Y)]$$

is the expectation of a linear estimator.

It is easy to see that neither (A) nor (B) can be correct, because the space of expectations of linear estimators is generated by $\{\theta_2, \theta_1-\theta_3\}$ and none of $\theta_1,\theta_3,$ or $\theta_1+\theta_3$ are in that space.

Consequently (D) is the unique correct answer.

A problem on estimability of parameters

2 Answers2

Addendum: The connection of estimability and identifiability

Definitions

Solution

Linked