1

I understand one-hot encoding is linearly dependent and if I drop one column, it would become linear independent, but I don't know how to prove it, can someone give me a mathematical proof.

KXM
  • 11

1 Answers1

2

Start with the definition:

A sequence of vectors $\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k$ from a vector space $V$ is said to be linearly dependent, if there exist scalars $a_1, a_2, \dots, a_k$, not all zero, such that

$$a_1\mathbf{v}_1 + a_2\mathbf{v}_2 + \cdots + a_k\mathbf{v}_k = \mathbf{0},$$

where $\mathbf{0}$ denotes the zero vector.

One-hot encoded matrix looks like

$$ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} $$

now add an intercept to the matrix:

$$ \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ \end{bmatrix} $$

You should not have problems finding non-zero vectors such that after multiplying the new matrix by them you would get zero vectors (e.g. $(-\pi, \pi, \pi, \pi)$). This shows that the columns are linearly dependent, so multicollinear. That is why we drop one of the columns per each feature.

Tim
  • 138,066
  • Dropping a column in (3) will not create linearly independent columns. – whuber Oct 21 '22 at 13:32
  • Thanks, but that still doesn't fix the example! The rank of the matrix in $(3)$ is $3,$ implying you would have to drop at least three columns before the remaining ones are linearly independent. – whuber Oct 21 '22 at 13:39