Handling multicollinearity with Restricted Least Squares

Question

The dummy variable trap - including a dummy variable for every category and including a constant term in the regression together guarantees perfect multicollinearity - is most commonly resolved by dropping out one of the dummies.

However, I was also told that an equivalent alternative approach will be to add a constraint such that the sum of the coefficients corresponding to ALL of the dummy variables equals zero $(\sum_{i \in Category} \beta_i = 0)$.

I am having trouble proving this claim - prediction (in OLS sense, $X\beta$) from both approaches are equivalent.

This is my attempt

We know that coefficients of Restricted OLS are $\beta^{RLS} = \beta^{OLS} - (X^TX)^{-1}R^T[R(X^TX)^{-1}R^T]^{-1}(R\beta^{OLS} - r)$

We can formulate the "drop one out" solution (for a categorical variable with 3 groups) as below.
$min_{\beta} {(Y-X\beta)^2} \\ s.t. \beta_{Group 3} = 0$

where $\beta = [\beta_{intercept}, \beta_{Group 1}, \beta_{Group 2}, \beta_{Group 3} ]$

We can express the constraint of $\beta_{Group 3} = 0$ as
$R = [0, 0, 0, 1]$ and $r = 0$

The other alternative proposal to handle dummy variable trap, sum of coefficients of the dummy variable equal zero, can be expressed as:
$R = [0, 1, 1, 1]$ and $r = 0$

I need to show that $X\beta^{RLS, drop-one-out} == X\beta^{RLS, sum=0}$.

$X\beta^{RLS} = X(X^TX)^{-1}X^TY - X(X^TX)^{-1}R^T[R(X^TX)^{-1}R^T]^{-1}R\beta^{OLS} \hspace{5mm}\text{as r = 0 in both cases}$

I tried proceeding with 2 different simplifications
$$\begin{align} X\beta^{RLS} &= X(X^TX)^{-1}[X^TY - R^T[R(X^TX)^{-1}R^T]^{-1}R\beta^{OLS} \\ &= X(X^TX)^{-1}[X^TY - R^T[R(X^TX)^{-1}R^T]^{-1}R(X^TX)^{-1}X^TY \\ &= X(X^TX)^{-1}[I - R^T[R(X^TX)^{-1}R^T]^{-1}R(X^TX)^{-1}]X^TY \\ \end{align}$$

Given $R$ is not square and hence, not invertible in the regular sense, I was not able to proceed to show that $X\beta^{RLS}$ will be equal for the 2 different constraint matrix, R.

The second simplification that I tried was

$$\begin{align} X\beta^{RLS} &= X(X^TX)^{-1}X^TY - X(X^TX)^{-1}R^T[R(X^TX)^{-1}R^T]^{-1}R\beta^{OLS} \\ &= X(X^TX)^{-1}X^TY - X(X^TX)^{-1}R^T[R(X^TX)^{-1}R^T]^{-1}R(X^TX)^{-1}X^TY \\ &= X(X^TX)^{-1}X^TY - D^T[R(X^TX)^{-1}R^T]^{-1}DY \\ &\text{where } D=R(X^TX)^{-1}X^T\\ \end{align}$$

And again I can't see how I could continue to prove that the prediction from Linear Regression, $X\beta^{RLS}$, is equal under the 2 different constraint matrices 1) $\beta_{Group3} = 0$ and 2) $\beta_{Group1} + \beta_{Group2} + \beta_{Group3} = 0$.

Any help is much appreciated. Thank you.

score 1 · Accepted Answer · answered Aug 12 '22 at 15:15

This is relatively simple to demonstrate by going back to your regression equation for the model. Suppose you have a catagorical regressor variable $x$ with $k$ categories. If you include $k$ dummy variables and an intercept term then you get a model equation like this:

$$y = \beta_0 + \sum_{j=1}^k \beta_j \cdot \mathbb{I}(x = j) + \varepsilon.$$

Imposing the constraint $\sum \beta_j = 0$ means that $\beta_k = -\sum_{j=1}^{k-1} \beta_j$ which allows us to re-write this model equation as:

$$\begin{align} y &= \beta_0 + \sum_{j=1}^k \beta_j \cdot \mathbb{I}(x = j) + \varepsilon \\[6pt] &= \beta_0 + \beta_k \cdot \mathbb{I}(x = k) + \sum_{j=1}^{k-1} \beta_j \cdot \mathbb{I}(x = j) + \varepsilon \\[6pt] &= \beta_0 -\sum_{j=1}^{k-1} \beta_j \cdot \mathbb{I}(x = k) + \sum_{j=1}^{k-1} \beta_j \cdot \mathbb{I}(x = j) + \varepsilon \\[6pt] &= \beta_0 + \sum_{j=1}^{k-1} \beta_j \cdot [\mathbb{I}(x = j)-\mathbb{I}(x = k)] + \varepsilon. \\[6pt] \end{align}$$

Now, this latter form of the model is using an intercept term and $k-1$ regressor variables, where those regressor variables are an invertible linear combinations of the $k-1$ dummy variables. Consequently, the "equivalence" of this model with the standard model occurs as part of the general "equivalence" of any linear regression models where the design matrices differ only by an invertible linear transformation. Demonstration of this equivalence is covered in this related question.

Handling multicollinearity with Restricted Least Squares

1 Answers1