How did the hat matrix get its name
$\hat{\mathbf{H}} = \mathbf{X} \left( \mathbf{X}^\textsf{T} \mathbf{X} \right)^{-1} \mathbf{X}^\textsf{T}$
I am interested in the etymology of the term. Who gave it a name and why?
How did the hat matrix get its name
$\hat{\mathbf{H}} = \mathbf{X} \left( \mathbf{X}^\textsf{T} \mathbf{X} \right)^{-1} \mathbf{X}^\textsf{T}$
I am interested in the etymology of the term. Who gave it a name and why?
The name "hat matrix" is a mnemonic: a shortcut to help us remember the role it plays in regression. As @RobertLong explains in Learning hat matrix,
The hat matrix is the projection matrix that maps the response vector $Y$ to the vector of fitted values $\hat{Y}$ (hence the name "hat" matrix).
As to history, according to
David, H. A. “First (?) Occurrence of Common Terms in Mathematical Statistics.” The American Statistician, vol. 49, no. 2, 1995, pp. 121–33. https://doi.org/10.2307/2684625.
the term "hat matrix" first appears in
Hoaglin, David C., and Roy E. Welsch. “The Hat Matrix in Regression and ANOVA.” The American Statistician, vol. 32, no. 1, 1978, pp. 17–22. https://doi.org/10.2307/2683469.
But the authors themselves attribute it to J. W. Tukey.
The "Hat Matrix" is a term used in regression analysis and refers to a mathematical matrix that helps to transform a set of observations into predicted values. The name "Hat Matrix" comes from the fact that the matrix effectively puts a "hat" on the observed values to indicate that they have been transformed into predicted values.
– dipetkov Apr 19 '23 at 21:39The term was first introduced by the statistician Arthur Robinson in 1973 in his paper "The Use of Control Variables in Regression Analysis". In the paper, Robinson referred to the matrix as the "projection matrix", but later realized that the notation he used resembled a hat, and so the term "Hat Matrix" was born.
– dipetkov Apr 19 '23 at 21:39In fact the formula should read $\mathbf{H} = \mathbf{X} \left( \mathbf{X}^\textsf{T} \mathbf{X} \right)^{-1} \mathbf{X}^\textsf{T}$, therefore $\hat{\mathbf{y}}=\mathbf{H}\mathbf{y}$. The hat matrix is called hat matrix because it puts a hat on the $\mathbf{y}$. (This is in fact mentioned on the Wikipedia page linked by @dipetkov, but I had heard it before from somebody who I think had heard Tukey mentioning it.)