13

How did the hat matrix get its name

$\hat{\mathbf{H}} = \mathbf{X} \left( \mathbf{X}^\textsf{T} \mathbf{X} \right)^{-1} \mathbf{X}^\textsf{T}$

I am interested in the etymology of the term. Who gave it a name and why?

Alex
  • 2,011

2 Answers2

18

The name "hat matrix" is a mnemonic: a shortcut to help us remember the role it plays in regression. As @RobertLong explains in Learning hat matrix,

The hat matrix is the projection matrix that maps the response vector $Y$ to the vector of fitted values $\hat{Y}$ (hence the name "hat" matrix).

As to history, according to

David, H. A. “First (?) Occurrence of Common Terms in Mathematical Statistics.” The American Statistician, vol. 49, no. 2, 1995, pp. 121–33. https://doi.org/10.2307/2684625.

the term "hat matrix" first appears in

Hoaglin, David C., and Roy E. Welsch. “The Hat Matrix in Regression and ANOVA.” The American Statistician, vol. 32, no. 1, 1978, pp. 17–22. https://doi.org/10.2307/2683469.

But the authors themselves attribute it to J. W. Tukey.

dipetkov
  • 9,805
  • It's funny that the name was meant as a mnemonic to remember the purpose, yet the purpose is remembered while the name is unclear. When the shortcut is longer than the regular route... – Jason C Nov 02 '22 at 10:29
  • 2
    @JasonC The idea to describe it as a mnemonic was mine. The Hoaglin and Welsch paper only says: "The term hat matrix is due to John W. Tukey who introduced us to the technique about ten years ago." So we don't know Tukey's purpose; maybe he used it in jest also? But the definition of mnemonic is pretty broad, so I think it qualifies. – dipetkov Nov 02 '22 at 11:14
  • It's an appropriate name in any case! – Jason C Nov 02 '22 at 14:03
  • 1
    @JasonC Since this question is in part about history, I'll assume that this might be of interest to the OP: John Tukey appears in a couple of amusing episodes in Richard Feynman's biography Genius. They were in Princeton together. One story is about hexahexaflexagons... – dipetkov Nov 02 '22 at 14:27
  • I remembered this amusing question while trying out ChatGPT (after reading yet one more article about how cool the technology is). So here its response: – dipetkov Apr 19 '23 at 21:38
  • The "Hat Matrix" is a term used in regression analysis and refers to a mathematical matrix that helps to transform a set of observations into predicted values. The name "Hat Matrix" comes from the fact that the matrix effectively puts a "hat" on the observed values to indicate that they have been transformed into predicted values.

    – dipetkov Apr 19 '23 at 21:39
  • The term was first introduced by the statistician Arthur Robinson in 1973 in his paper "The Use of Control Variables in Regression Analysis". In the paper, Robinson referred to the matrix as the "projection matrix", but later realized that the notation he used resembled a hat, and so the term "Hat Matrix" was born.

    – dipetkov Apr 19 '23 at 21:39
  • Nice story. Now the mystery is whether it is true. Spoiler alert: I couldn't find such a 1973 paper. Update: Not sure about Arthur Robinson, statistician, either. ChatGPT might mean Arthur H. Robinson, cartographer? He probably did know a lot about (map) projections. – dipetkov Apr 19 '23 at 22:00
17

In fact the formula should read $\mathbf{H} = \mathbf{X} \left( \mathbf{X}^\textsf{T} \mathbf{X} \right)^{-1} \mathbf{X}^\textsf{T}$, therefore $\hat{\mathbf{y}}=\mathbf{H}\mathbf{y}$. The hat matrix is called hat matrix because it puts a hat on the $\mathbf{y}$. (This is in fact mentioned on the Wikipedia page linked by @dipetkov, but I had heard it before from somebody who I think had heard Tukey mentioning it.)

  • Hoaglin and Welch (1977) also attribute it to Tukey, e.g. see here: http://dspace.mit.edu/bitstream/handle/1721.1/1920/SWP-0901-02752210.pdf (last sentence of first paragraph of the Introduction). – Glen_b Oct 31 '22 at 05:25