I understand one-hot encoding is linearly dependent and if I drop one column, it would become linear independent, but I don't know how to prove it, can someone give me a mathematical proof.
1 Answers
Start with the definition:
A sequence of vectors $\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k$ from a vector space $V$ is said to be linearly dependent, if there exist scalars $a_1, a_2, \dots, a_k$, not all zero, such that
$$a_1\mathbf{v}_1 + a_2\mathbf{v}_2 + \cdots + a_k\mathbf{v}_k = \mathbf{0},$$
where $\mathbf{0}$ denotes the zero vector.
One-hot encoded matrix looks like
$$ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} $$
now add an intercept to the matrix:
$$ \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ \end{bmatrix} $$
You should not have problems finding non-zero vectors such that after multiplying the new matrix by them you would get zero vectors (e.g. $(-\pi, \pi, \pi, \pi)$). This shows that the columns are linearly dependent, so multicollinear. That is why we drop one of the columns per each feature.
- 138,066
-
Dropping a column in (3) will not create linearly independent columns. – whuber Oct 21 '22 at 13:32
-
Thanks, but that still doesn't fix the example! The rank of the matrix in $(3)$ is $3,$ implying you would have to drop at least three columns before the remaining ones are linearly independent. – whuber Oct 21 '22 at 13:39