0

I am revisiting some basic concepts involving t-tests and ANOVAs, and got tripped up early. I wanted to apply the concept of lack-of-fit sum of squares to the single sample t-test but wonder how this can be treated as a ordinary LS problem, if at all. In linear least squares there are adjustable fitting parameters and a sum of squared errors that is minimized. One condition that is apparently satisfied as a result of the fit is $$\sum_i \epsilon_i =\sum_i (y_i-\hat{y_i})=0 $$ where $\epsilon_i$ is the error associated with the difference between the response variable $y_i$ and the model $\hat{y_i}$. This then allows some terms to be set to zero when partitioning the SSE into pure error and regression terms, which are then used to compute the ratio on which a ratio test (a t or more generally an F-test) is based. At least that's how I understand the connection between these concepts (outlined for instance in this Wikipedia page).

However in a single or two-sample t-test there are no adjustable parameters at all, since we stipulate a rigid model (say that the population mean or difference of means equals a fixed value). How to show that the sum of errors equals 0, justifying the partition of summed squares? This seems essential to showing the connection with least-squares, or perhaps it isn't? Maybe a preliminary question is, what if anything is being fit in a one-sample t-test?

I realize there are related questions and I am going through some of these but am guessing my question differs significantly.

Buck Thorn
  • 101
  • 3

1 Answers1

2

The one-sample t-test corresponds in part to fitting an intercept-only linear model. This model does have a parameter: the intercept. See if you can convince yourself that if $X$ is the all-ones vector then the familiar ordinary least squares solution $(XX^\top)^{-1}Xy^\top$ is equal to the sample mean $\bar{y}$.

See also Common statistical tests are linear models by Jonas Kristoffer Lindeløv.

jdonland
  • 247
  • Thank you. I see now how regression and computation of a mean and minimization of variance about a parameter estimate are related and implemented in the t-test to generate a model estimate $\beta_0$. The link you provide is helpful with that. What I missed and still confuses me somewhat is the deviation between this model parameter and the actual population mean $\mu$. Afai understand the likelihood of observing that difference, using the computed variance as an estimate of the actual population variance, is what the test tests. That's ok. The question of what is the model still confuses. – Buck Thorn Mar 11 '24 at 18:36