3

This question is about the difference between the sum of lm.influence(model)$hat and the trace of the Hat-Matrix $H := X (X' X)^{-1} X'$ calculated "by hand".

Let's take for example the cars data-set and the linear-model $dist = \beta \cdot speed + \varepsilon$

data(cars)
mod <- lm(dist ~ speed, cars)

Using lm.influence to calculate the trace I get 2. lm.influence is supposed to return the diagonal elements of $H$ (see ?lm.influence).

sum(lm.influence(mod)$hat)
[1] 2

Calculating the trace of $H$ "by hand" I get 1:

X <- cars$speed
H <- X %*% solve(t(X) %*% X) %*% t(X)
sum(diag(H))
[1] 1

From my understanding it should be 1 as the Rank of a 1-column-vector $\equiv$ 1.

For linear models, the trace of the hat matrix is equal to the rank of X, which is the number of independent parameters of the linear model. Source: http://en.wikipedia.org/wiki/Hat_matrix

Peter Mortensen
  • 343
  • 3
  • 10
Rentrop
  • 282

2 Answers2

7

In R, the model lm(dist ~ speed, cars) includes an intercept term automatically. So you have actually fitted $dist = \beta_0 + \beta_1 speed + \epsilon$.

You rarely want to drop the intercept term but if you did, you can do lm(dist ~ 0 + speed, cars) or even lm(dist ~ speed - 1, cars). See this question on Stack Overflow for more details.

Without the intercept term, we get the result that matches what you got "by hand":

> data(cars)
> mod <- lm(dist ~ 0 + speed, cars)
> sum(lm.influence(mod)$hat)
[1] 1

You can see this geometrically by considering the hat matrix as an orthogonal projection onto the column space of $X$. With the intercept term in place, the column space of $X$ is spanned by $\mathbf{1}_n$ and the vector of observations for your explanatory variable, so forms a two-dimensional flat. An orthogonal projection onto a two-dimensional space has rank 2 - set up your basis vectors so that two lie in the flat and the others are orthogonal to it, and its matrix representation simplifies to $H = \text{diag}(1,1,0,0,...,0). $ This clearly has rank 2 and trace 2; note that both of these are preserved under change of basis, so apply to your original hat matrix too.

Dropping the intercept term is like dropping the $\mathbf{1}_n$ from the spanning set, so now your column space of $X$ is one-dimensional. By a similar argument the hat matrix will have rank 1 and trace 1.

Silverfish
  • 23,353
  • 27
  • 103
  • 201
0

There is a small mistake in your interpretation, let me take a detailed stab at it, for someone trying to get the essence of it, else the original answer provided by Silverfish suffices.

When you say:

$:> mod <- lm(dist ~ speed, data=cars)

That means, you are using 1 predictor, which is speed but the lm() function will add a bias term by default. The linear regression model used would be a simple linear regression (i.e. just one predictor and two parameters) and it's equation would be:

$Y = \beta_0 + \beta_1speed + \varepsilon$

Where, $\beta_0$ was added by default.


Now, let's try to compute $H$ matrix and find it's trace:

$HY = \widehat Y$

$=> HY=X \widehat \beta$

Using OLS, $\widehat \beta$ could be shown as: $(X^TX)^{-1}X^TY$

$=> HY=X . (X^TX)^{-1}X^TY$

$=> H = X(X^TX)^{-1}X^T$

Now this is your final H matrix.

Here, let $A = X(X^TX)^{-1}$ and $B = X^T$

Applying commutative property of the trace operator: $tr(AB) = tr(BA)$ which could be easily derived (just try multiplying two matrices and summing their diagonals):

$trace(H) = tr(X (X^TX)^{-1} X^T) = tr(AB) = tr(BA) = tr(X^T . X (X^TX)^{-1} ) = tr(I_{X^TX}) = tr(I_{(p+1) \times n . n \times (p+1)} = tr(I_{p+1 \times p+1}) = tr(I_{p+1}) = tr(I_{p+1}) = \sum^{p+1}_{i=1} (1) = p+1$

Therefore, $trace(H) = p+1$


Now, for your fitted SLR model mod where number of predictors were 1 (i.e. p=1)

$trace(H) = p+1 = 1+1 = 2$

That is what you get, when you perform using the model function:

$:> H_diagonals = lm.influence(mod)$hat
$:> trace_H = sum(H_diagonals)
$:> cat("\nTrace(H) using model:", trace_H)

[1] Trace(H) using model: 2

You could compute this by hand as well:

$:> # Design Matrix, shape: n x (p+1)
$:> X = cbind(1, cars$speed)

$:> XTX_inverse = solve(t(X) %% X) $:> H_byhand = X %% XTX_inverse %*% t(X) $:> trace_byhand = tr(H_byhand) $:> cat("\nTrace(H) by hand: ", trace_byhand)

[2] Trace(H) by hand: 2

Now, here only while doing it by hand, if you don't add ones in the design matrix, and leave your X to have a shape of n x p (i.e. n x 1) it will result in $trace(H) = p = 1$ which you are getting. So ideally, you are taking a wrong format of design matrix (X).

It should always be:

$\begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{1,1} & x_{1,2} & ... & x_{1,p} \\ 1 & x_{2,1} & x_{2,2} & ... & x_{2,p} \\ \vdots & \ddots & \ddots & \vdots \\ 1& x_{n,1} & x_{n,2} & ... & x_{n,p} \end{bmatrix} * \begin{bmatrix} \beta_{0} \\ \beta_{2} \\ \vdots \\ \beta_{p} \end{bmatrix} + \begin{bmatrix} \epsilon_{1} \\ \epsilon_{2} \\ \vdots \\ \epsilon_{n} \end{bmatrix}$

In your case (SLR):

$\begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{1,1} \\ 1 & x_{2,1} \\ \vdots & \vdots \\ 1& x_{n,1} \end{bmatrix} * \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} + \begin{bmatrix} \epsilon_{1} \\ \epsilon_{2} \\ \vdots \\ \epsilon_{n} \end{bmatrix}$

Where, X has a dimension of $n \times (p+1)$ which is $n \times 2$

Hope it clarifies your doubt. :)

Pranzell
  • 121
  • 4