Append Y with Zero and Duplicate X

Question

I am interested in a related question to: What are the consequences of "copying" a data set for OLS?.

Now instead of duplicating both X and Y, suppose we duplicate X but append Y with zero's. What's the effect on beta and t-score.

Using the same method in the answer to the linked question, it's not hard to show that the new coefficient is half of the old coefficient. However, it's a bit unclear to me how to think about the change in the t-score. Intuitively, it sounds like this new regression is equivalent to regularization which shrinks the coefficient, so t-score should decrease. But is there mathematical way to show this?

whuber · Accepted Answer · 2024-03-02T15:25:16.953

The ordinary least squares solution to $X\beta =y$ is

$$\hat\beta = (X^\prime X)^{-}X^\prime y$$

(where $^{-}$ denotes a generalized inverse) and, in terms of the estimated error variance $\hat\sigma^2,$ its variance-covariance matrix is

$$V = \hat\sigma^2 (X^\prime X)^{-1}.$$

The rules of matrix multiplication shows that when $X$ is duplicated to $X_2 = (1,1)^\prime \otimes X$ and $y$ is extended to $y_2 = (1,0)^\prime \otimes y$, then $X_2^\prime X_2 = 2X^\prime X$ and $X_2^\prime y = X^\prime y,$ whence the new OLS solution to $X_2\beta = y_2$ is

$$\hat\beta_2 = (2X^\prime X)^{-} X_2^\prime y_2 = \frac{1}{2}\hat\beta.$$

That is, all the parameter estimates are halved.

The OLS estimate can be expressed in terms of the predicted values $\hat y = X\hat\beta$ as

$$\hat\sigma^2 = \frac{1}{n-k}||y - \hat y||^2$$

where $X$ has $n$ rows and rank $k.$ In the "doubled" version notice that

$$\hat y_2 = X_2\hat\beta_2 = \frac{1}{2}X_2\hat\beta$$

is the "doubled" version of $\hat y.$ Whence, because $y-\hat y$ is orthogonal to $\hat y$ and the rank of $X_2$ is the rank of $X,$

$$\begin{aligned} \hat\sigma_2^2 &= \frac{1}{2n - k}||y_2 - \hat y_2||^2 \\ &= \frac{1}{2n-k} \left(||y - \frac{1}{2}\hat y||^2 + ||\mathbf 0 - \frac{1}{2}\hat y||^2\right) \\ &= \frac{1}{2n-k}\left(||y-\hat y||^2 - \frac{1}{2}||\hat y||^2\right)\\ &= \frac{1}{2n-k}\left((n-k)\hat\sigma^2 - \frac{1}{2}||\hat y||^2\right). \end{aligned}\tag{*}$$

The new variance-covariance matrix therefore is

$$V_2 = \hat\sigma_2^2(X_2^\prime X_2)^{-} = \frac{\hat\sigma_2^2}{2}(X^\prime X)^{-} = \frac{\hat\sigma_2^2}{2\hat\sigma^2} V.$$

The new $t$ statistic for a linear hypothesis $\mathbf c^\prime \beta = a$ is

$$t_2(\mathbf c, a) = \frac{\mathbf c^\prime \hat\beta_2 - a}{\sqrt{\mathbf c^\prime V_2 \mathbf c}}= \frac{\sqrt{2}\hat\sigma}{2\hat\sigma_2}\,\frac{\mathbf c^\prime \hat\beta - 2 a}{\sqrt{\mathbf c^\prime V \mathbf c}} = \frac{\sqrt{2}\hat\sigma}{2\hat\sigma_2}t(\mathbf c, 2a).$$

This tells us the t-statistic for this hypothesis is a multiple of the t-statistic for the hypothesis $\mathbf c^\prime \beta = 2a.$ According to $(*),$ that multiple is predictable but a little complicated. Consider these extremes:

When $||\hat y||^2$ is tiny (compared to $(n-k)\hat\sigma^2$), then $\hat\sigma_2^2$ is close to $(n-k)/(2n-k)$ times $\hat\sigma^2.$ This is a little less than $1/2.$ The multiple of the $t$ statistic is then slightly greater than $1:$ the $t$ statistic can increase when the predicted values are all small.
Otherwise, $||\hat y||^2$ will contribute enough to $\hat\sigma_2^2$ to cause the multiple to be less than $1.$

When $a=0,$ that tells the entire story. When $a\ne 0,$ the numerator of the $t$ statistic also changes from $\mathbf c^\prime \hat\beta - a$ to $\mathbf c^\prime \hat \beta - 2a.$ This could equal anything, depending on the value of $a.$

This is enough analysis to produce illustrative examples. Perhaps the most delicate is the first case, where the $t$ statistic increases. The best chance of creating such an example is when $n-k$ is tiny compared to $2n-k$ and the predicted vector $\hat y$ is tiny. Consider, then, these data:

$$x = (1, 2, \ldots, n)^\prime;\quad y = (-n/2, 1, 1, \ldots, 1, 1/n - n/2)^\prime.$$

When $n=3,$ for instance, $\hat\beta = (-8/9, 1/6)$ and the corresponding $t$ statistics (to compare each coefficient of $\hat\beta$ to $0$) are $-0.305$ and $0.124.$ However, $\hat\beta_2 = (-4/9, 1/12)$ and the $t$ statistics are $-0.405$ and $0.164,$ respectively: they increased in magnitude.

This is apparent in the plot: introducing the second set of zero values for the response $y$ introduces a typical conditional spread of length about $1$ between the $y$ values, which is smaller than one would estimate based on the original (gray) values alone (equal to $1.905$).

Otherwise, when the new zero values of $y$ are far from what would originally be predicted, they increase the estimated error variance and thereby decrease the magnitudes of any $t$ statistics for comparing the coefficients to zero.

In this example the original gray points nearly line up, producing a tiny estimated error variance and thereby creating a large $t$ statistic for the slope. Upon introducing the (red) zero values, the error estimate grows from $0.1$ to over $1,$ thereby decreasing the t-statistic for the slope by over an order of magnitude.

How did $|y- .5\hat{y}| + |0 -.5\hat{y}|$ become $|y- \hat{y}| +.5 |0 -\hat{y}|$? I am not sure this is correct. — user2330624, Mar 02 '24 at 15:05
@user2330624 Thanks for catching that. There's a sign error there. — whuber, Mar 02 '24 at 15:24
@user2330624 This notation is universal in mathematics -- more so, even, than block matrix notation. — whuber, Mar 02 '24 at 15:46

Append Y with Zero and Duplicate X

1 Answers1