2

My question is about t-test values for linear regression coefficients using OLS and MLE. There are a few related posts on this website (here, and here ), and I could not find the exact answer.

The following are a few starting points to set the context:

(1) We know that OLS and MLE will generate the same estimates for regression coefficients(e.g., $\hat{\beta}_0$, $\hat{\beta}_1$).

(2) We know that linear regression coefficients estimated by OLS do not assume normal distribution for $\epsilon$. Thus, if we want to get a p-value for estimated coefficients, typically we assume normal distribution for $\epsilon$, same as MLE.

(3) Below is from the textbook of Bain and Engelhardt's Intro. to Prob. and Math. Stat. (p. 514). The $T_0$ and $T_1$ are for intercept and slope t tests in MLE for simple linear regression. We can see that it cancels out the population $\sigma^2$ and only keeps the unbiased estimate $\tilde{\sigma}$ in the t-statistic. We know that the estimate of $\sigma^2$ in OLS is also unbiased $\tilde{\sigma}$.

My question is: Are t-statistic formulas and values for linear regression coefficients the same across OLS and MLE?

I believe so. If it is not true, can you point out where I make mistakes in this reasoning process? Thank you so much. I look forward to your insights, suggestions, and comments.

(Note that, let's just assume $\sigma^2$ is unkown, and we will just use t-test rather than standard normal, to limit the scope of the discussion here. I asked a related question here, but not the same question. I did not have space in that question to discuss this, and thus I am asking it as a new question. Plus, this is a different question. You might point out that there are other statistics (not just t-test) to test regression coefficients in MLE. However, let's just stick to t-test in this question, to limit the scope of this question. Thank you so much.)

enter image description here

Added comments - Part 1:

(1) You might have questions regarding how to derive estimated $\sigma^2$ in MLE. If so, please refer to this PDF from Ryan Adams. The short answer is that, in MLE, $\hat{\sigma}^2 =\frac{\sum_{i=1}^n (y_i – \hat{\beta}_0-\hat{\beta}_1 x_i)^2}{n}$ is a biased estimate of $\sigma^2$. However, note that, it does not really matter here though, since the t-test shown above uses unbiased $\tilde{\sigma}$.

(2) I took a photo on the section of $\sigma^2$ in OLS (p.502). It provides the context of how $\sigma^2$ links with $\tilde{\sigma}^2$.

enter image description here

Added comments - Part 2 Added on July 21, 2023:

The picture above provides how $\tilde{\sigma}^2$ is derived in OLS. There are questions regarding how $\tilde{\sigma}^2$ is derived in MLE then.

Note that, as mentioned in the picture above, $\tilde{\sigma}^2$ is the same across OLS and MLE. Below are Theorem 15.3.4, 5, 6 about this from Bain and Engelhardt's Intro. to Prob. and Math. Stat. (p. 510-514).

enter image description here enter image description here enter image description here

Will
  • 53
  • Welcome to Cross Validated! Maximum likelihood estimation according to what likelihood? – Dave Jul 19 '23 at 14:30
  • Thanks! Likelihood assuming normal distribution for the noise term $\epsilon$. Not sure I answered your question. If not, pleaser refer to this post: https://stats.stackexchange.com/questions/559274/does-ordinary-least-squares-ols-have-any-inherent-relationship-with-maximum – Will Jul 19 '23 at 14:35
  • The image is part of the OLS theory, not MLE. If the authors claim this is MLE, then they are mixing things up. One clue that this is not MLE is the formula for $V$ implies the estimate of $\sigma^2$ is the OLS estimate, not the MLE estimate. – whuber Jul 19 '23 at 15:38
  • 1
    Hi Will: it's an interesting question but let me throw one back at you. When one solves for the MLE of $\beta$, how does one obtain the MLE for $\sigma^2$. That's important because, how one does that, determines whether the 2 parameter estimates of $\sigma^2$ from the two different methodologies will return the same values ? If they are the same, then the t-statistics one constructs will be the same because all the parameter estimates are the same. If they are not the same, then the t-statistics one constructs will be slightly different and how different will depend on how large $n$ is. – mlofton Jul 19 '23 at 17:16
  • In above, I didn't ask about $\hat{\beta}$ from the MLE procedure because that has to give the same result as the OLS procedure. – mlofton Jul 19 '23 at 17:18
  • @mlofton Thank you for stopping by again! Appreciate it. The following is the link, a pdf posted by Ryan Adams, which provides details how to derive the estimated $\sigma^2$. The result from the text book is the same as this pdf. https://www.cs.princeton.edu/courses/archive/fall18/cos324/files/mle-regression.pdf – Will Jul 19 '23 at 18:17
  • 1
    @whuber Thank you for stopping by! You are right in a sense that $\tilde{\sigma}^2$ is used in OLS. $\tilde{\sigma}^2 =\frac{SSE}{n-2} =\frac{\sum_{i=1}^n (y_i – \hat{\beta}_0-\hat{\beta}_1 x_i)^2}{n-2}$ is an ubiased estimate of $\sigma^2$. Further, estimated $\sigma^2$ from MLE is biased.

    However, in MLE, unbiased estimate of $\sigma^2$ is also $\tilde{\sigma}^2$. Thus, the $V \sim \chi^2(n-2)$ shown in the pic. is independent from the fact that $\tilde{\sigma}^2$ is from OLS or MLE, since $\tilde{\sigma}^2$ is the same across OLS and MLE.

    – Will Jul 19 '23 at 18:49
  • @mlofton I forgot to address your comment regarding $\sigma^2$ is the same or not across OLS and MLE. My understanding: Since $\sigma^2$ is unknown, there is no way to know it is the same or not across OLS and MLE. However, in both OLS and MLE, $\tilde{\sigma}^2$ is the unbiased estimate for $\sigma^2$. Note that, the final format of the t-test shown in the picture does not involve $\sigma^2$, but only $\tilde{\sigma}^2$. – Will Jul 19 '23 at 18:56
  • Will: You kind of answered your own question with your added comments. The estimate of $\hat{\sigma^2}$ used for the MLE ( I didn't read it but it sounds like used a concentrated likelihood to find the MLE ) has an $n$ in the denominator. The estimate used in the OLS procedure is the unbiased estimate which has an $(n-1)$ in the denominator.So, the $\hat{\beta}$ from the OLS and the MLE approach will be the same but the $\hat{\sigma^2}$ will be slightly different. – mlofton Jul 19 '23 at 20:00
  • @mlofton A nit, but to avoid confusion note that the OLS denominator is $n-p-1,$ which here is the "$n-2$" appearing in the formulas. The MLE denominator is always $n.$ Anything other than that is some hybrid procedure. – whuber Jul 19 '23 at 20:02
  • 1
    Therefore, since these respective estimates are used in the calculation of the respective t-statistics, you won't get the same EXACT result for the t-statistics. But, for larger $n$ it won't matter in the sense that the difference is negligible. – mlofton Jul 19 '23 at 20:03
  • @whuber: right. nice catch and my fault. but my point is that the princeton guy is using a different estimate for $\sigma^2$ than the one used in OLS. So, the t-stats will be slightly different. – mlofton Jul 19 '23 at 20:05
  • Will: The main reason your question is tricky is because, in the MLE approach, how one calculates the likelihood is somewhat subjective because one has two parameters in the likelihood. But, whether one uses the full likelihood, the profiled likelihood or even calculates $\hat{\sigma}^2$ after the fact ( like I think the princeton guy is doing but I didn't read it. I'm kind of rushing ), as whuber cleverly pointed out, the denominator of the formula for the estimate of $\sigma^2$ should always be $n$. – mlofton Jul 19 '23 at 20:11
  • 1
    This means that the $\sigma^2$ estimate from the MLE approach will always be different from that of the OLS approach. – mlofton Jul 19 '23 at 20:12
  • Thank you, both mlofton and whuber! I know that the estimates, namely $\hat {\sigma}^2$, are different acorss OLS and MLE (n in MLE and n-p-1 in OLS). But, the t-statistic ($T_0, T_1$) shown in the first picture has nothing to do with $\hat {\sigma}^2$, it only has $\tilde{\sigma}^2$, which is the same acorss OLS and MLE. This is because $\sigma^2$ gets canceled out as it exists in both numerator and denominator in $T_0$ and $T_1$. – Will Jul 19 '23 at 20:33
  • Will: I have to look at the difference between the tilded $\sigma^2$ and the hatted $\sigma^2$ in order to address your comment above but I'm too tired at the moment to focus. What I can guarantee is that, if the estimates of $\sigma^2$ are different for the two approaches, namely OLS and ML, then the resulting t-statistics will be different also because the t-statistic is only a function of the estimate of $\sigma^2$ and $\hat{\beta}$. – mlofton Jul 20 '23 at 07:53
  • @mlofton thank you for the comment. No rush at all. – Will Jul 20 '23 at 17:54
  • Will: It's not well explained and very confusing but the $\tilde\sigma$ in the first picture is the ESTIMATE of the true $\sigma^2$ which is unknown. So, as long as that ${\tilde\sigma}^2$ is different for OLS and ML, the resulting t-statistics will be different. The derivation of how the t-statistic comes about ( in the case where $\sigma$ is unknown ) seems to be poorly illustrated in the book where that picture comes from. It's been such a long time that I can't remember where a good derivation is. Maybe Neter and Wasserman ? or Casella and Berger ? – mlofton Jul 21 '23 at 14:53
  • Maybe someone knows where a good derivation is and could recommend a text or document. I'll try to find a document on the net ( all my texts are in storage ) and I'll send a link if I find something. The idea behind a good derivation is that the true unknown sigma, namely $\sigma$, ( not hatted or tilded ) gets cancelled out because it's in the numerator and the denominator. So, even though one doesn't know the value, it doesn't matter because it's not in the final expression for the t-stat. In that picture, it seems like it is in the final result which makes things EXTREMELY confusing. – mlofton Jul 21 '23 at 14:58
  • @mlofton thank you! You meant the derivation of $\tilde{\sigma}^2$ in OLS and MLE? – Will Jul 21 '23 at 15:04
  • @mlofton I just added Theorem 15.3.4, 5, 6 related to how $\tilde{\sigma}^2$ in MLE is derived. – Will Jul 21 '23 at 15:32
  • Hi Will: I actually meant the t-statistic rather than $\tlde{\sigma}^2$. I found one derivation for the t-statistic in a thread on cross validated that looked decent but it had enough mistakes that made it confusing also. That thread said that monahan's "a primer on linear models" has a nice derivation but I don't have that text. hopefully someone on this list knows where a nice clean, mistake-free, non-matrix derivation of the t-statistic resides. it's gotta be somewhere. if found, you will see that the true value of $\sigma^2$ cancels quite nicely so all one ends up with is the estimate. – mlofton Jul 21 '23 at 15:37
  • @mlofton The picture shown above (i.e., theorem 15.3.6) shows how t-statistic is derived in MLE. Is it still unclear to you? I just added that picture. – Will Jul 21 '23 at 15:40
  • I got an additional question: Why does the $V$ in Theorem 15.3.6 use $V=\frac{(n-2)\tilde{\sigma}^2}{\sigma^2} \sim \chi^2(n-2)$ rather than $V=\frac{n\hat{\sigma}^2}{\sigma^2} \sim \chi^2(n-2)$? – Will Jul 21 '23 at 15:46
  • You seem to be confounding UMVUEs with MLEs. Although the one uses the theory of the other, they are different procedures. There is no $\tilde \sigma^2$ in MLE. The MLE estimate of $\sigma^2$ is represented as $\hat\sigma^2$ in your photos and, as @mlofton remarks, that is never equal to $\tilde \sigma^2.$ – whuber Jul 21 '23 at 15:48
  • @whuber Thank you for pointing it out the difference. I actually knew that MLE estimated $\sigma^2$ was the biased $\hat{\sigma}^2$ rather than the unbiased $\tilde{\sigma}^2$. Thank you still though. – Will Jul 21 '23 at 15:51
  • @whuber I know you pointed out that $\tilde{\sigma}^2$ was from OLS. Further it seems you think the theorem 15.3.6 is for OLS as well. If so, if you want to write out t-statistic formulas for MLE, how are the formulas different from ones in the theorem 15.3.6? Just replace all $\tilde{\sigma}^2$ with $\hat{\sigma}^2$? Or anything else? Thank you! – Will Jul 21 '23 at 16:06
  • 15.3.6 is about UMVUEs. It uses the OLS estimates. Thus, there are three separate concepts and three separate procedures involved here. The theorem says the OLS estimates are UMVUEs. The MLE estimate of $\sigma^2$ is not. MLE tests and OLS tests generally differ, even when (somewhat accidentally) their estimators might coincide. This becomes clearer when you work with non-Gaussian models. Thus, you might be focusing too much on superficial comparisons. It's best to learn the principles and assumptions of each approach, for only then do the differences become clear. – whuber Jul 21 '23 at 16:12
  • @whuber Thank you for providing your feedback. Let's just assume that 15.3.6 is indeed for OLS: Then, in what way are the t statistic formulas for MLE different from ones shown in 15.3.6? – Will Jul 21 '23 at 18:55
  • (1) MLE uses a z statistic whose denominator is the MLE estimate of $\sigma.$ (2) MLE uses a standard Normal distribution to compute a p-value. – whuber Jul 21 '23 at 19:15
  • @whuber Thank you for your input. After reading your comment, I had an impression that maybe we could just use z statistic for MLE. However, when I read another post which talks about wald statistic, z-statistic and t-statistic (https://stats.stackexchange.com/questions/60074/wald-test-for-logistic-regression), it reminded our last conversion in another post: when $\sigma^2$ is unknown, we use t-statistic. We use z statistic when $\sigma^2$ is known. – Will Jul 21 '23 at 19:59
  • In your second point (2) you use two sentences "we know that... Thus..." how does the second follow from the first? – Sextus Empiricus Jul 22 '23 at 05:19
  • @SextusEmpiricus Point (1) and point (2) are knowledge that I am assuming readers should know before getting to my question in this post. Point (1) and Point (2) have no any ordered relationships that you need to first know Point (1) to know Point (2), or even vice versa. – Will Jul 22 '23 at 15:07
  • @Will, my comment was about your point (2), which contains two sentences. Those sentences make no sense to me. The second sentence seems to make some conclusion, by starting the sentence with "thus". The logic behind that conclusion is not clear. My comment is asking you to explain those two sentences. I will repeat your two sentences in a shorter form "We know that OLS do not assume normal distribution for . Thus, we assume normal distribution for ." – Sextus Empiricus Jul 22 '23 at 16:03
  • @SextusEmpiricus The estimation of regression coefficients using OLS principle does not need the assumption of to be normally distributed. See the process here: https://are.berkeley.edu/courses/EEP118/current/derive_ols.pdf . As you can see, the whole calcuation process is just partial derivative, no normal distribution assumption. However, to calculate t or z statistic to get p-value, we need assumption on . See the post here, which mentions this. https://stats.stackexchange.com/questions/114445/does-least-squares-regression-imply-normality-of-errors?rq=1 – Will Jul 22 '23 at 20:13
  • @Will Your use of 'thus' was not so clear, it is not a conclusion that logically follows from the preceding sentences. Other assumptions of the distribution of $\epsilon$ can be made as well. So, you consider augmenting the ordinary least squares method with the assumption of the normal distribution as assumption and wonder whether this makes it the same as the method of maximum likelihood estimation with the same assumption? – Sextus Empiricus Jul 23 '23 at 08:51
  • @SextusEmpiricus Correct. I was wondering under the same assumption of ϵ with normal distribution, whether OLS and MLE will lead to the same t-statistic value and p-value. Again, thank you for your answer down below. BTW, this is a marathon comment thread :) – Will Jul 23 '23 at 19:45

2 Answers2

1

For given observations $Y$ and regression matrix $\mathbf{X}$, if we assume independent normal distributed observations with equal standard deviation.

$$Y|X \sim \mathcal{N}(X \beta,\sigma)$$

then the coefficients of the linear model are computed as

  • $\hat{\beta} = (\mathbf{X^TX})^{-1}\mathbf{X^T}Y$ for ordinary least squares estimation.
  • $\tilde{\beta} = (\mathbf{X^TX})^{-1}\mathbf{X^T}Y$ for maximum likelihood estimation.

The estimates are the same.

Also the sampling distribution of the estimate is the same and with the same distributional assumptions the inference with confidence intervals or p-values should be the same.

Differences may occur when different approaches are used to the computation of confidence intervals or p-values. For example computing one-sided or two-sided p-values. But, this is not in principle a difference between ordinary least squares and maximum likelihood estimation.


When we do not make the assumption of independent normal distributed observations, then the two methods will be different (see:Why are the Least-Squares and Maximum-Likelihood methods of regression not equivalent when the errors are not normally distributed?). The estimates can be different because the maximum likelihood does not need to coincidence with minimizing least squares.

In a special case we can have the same estimates but different p-values and confidence intervals. This occurs when the maximum likelihood estimation is computed with the assumption of normal distribution, but the ordinary least square method with a different assumption. Then the estimates are both determined by minimizing the sum of squared residuals, but the assumptions about the sampling distribution of the estimate is different.

  • Sextus Empiricus: Thank you for the feedback. I tend to agree with you. People in the comment section mentioned that the estimated $\sigma^2$ differ between OLS ($ \hat{\sigma}^2_{OLS}=\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{n-2}$) and MLE ($ \hat{\sigma}^2{MLE} =\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{n}$). However, when constructing the $V \sim \chi^2(n-2)$ for t-statistic: $\hat{\sigma}^2{OLS} * (n-2)/\sigma^2=\hat{\sigma}^2_{MLE} * (n)/\sigma^2$. Thus, t-statistic are the same, eventually. – Will Jul 23 '23 at 19:36
  • Note that, $V$ refer to the $V$ mentioned in the Theorem 15.3.6. Further, since $Z_0$ and $Z_1$ in Theorem 15.3.6 are the same across OLS and MLE as well. Thus, t-statistic are the same. Let's see if others have comments. Otherwise, I would accept your answer. – Will Jul 23 '23 at 19:41
0

let me again try to describe the distribution for $\hat{\beta}_1$ in OLS.

First, let's assume that the proof of
$$\frac{\hat{\beta}_1 - \beta_1}{\sqrt{\sigma^2/S_{xx}}} ~\sim~ N(0,1) $$ has already been provided and that $S_{xx}$ has its usual interpretation.

We need to estimate $\sigma^2$ and will denote $\tilde{\sigma}^2$ as the estimator: $$ \tilde{\sigma}^2 = \frac{1}{n-2} \sum_{i=1}^{n} (Y_i - \hat{Y}_i)^2. $$

Also, the following two statements would be proven in any decent math-stats textbook. We will just assume that they are true: \begin{equation*} \frac{(n-2)\tilde{\sigma}^2}{\sigma^2} ~\sim~ \chi^2_{n-2} \mathrm{~~~and~~~} \frac{N(0,1)}{\sqrt{{\chi^2_d}/{d}}} ~\sim~ t(d). \end{equation*}

Hence, by taking the normal random variable and dividing it by the square root of the $\chi^2(n-2)$ divided by $(n-2)$ random variable : $$ \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\sigma^2/S_{xx}}} \Big/ \sqrt{\frac{(n-2)\mathrm{\tilde{\sigma}^2}}{\sigma^2} \Big/ (n-2)} ~\sim~ t(n-2), $$

Fortunately, after a lot of algebra, the expression can be reduced to: $$ \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\tilde{\sigma}^2/S_{xx}}} = \frac{\hat{\beta}_1 - \beta_1}{\mathrm{SE}(\hat{\beta}_1)} $$

This means that the reduced expression also has to be distributed as $t(n-2)$. $$ \boxed{ \frac{\hat{\beta}_1 - \beta_1}{\sqrt{\mathrm{\tilde{\sigma}^2}/S_{xx}}} = \frac{\hat{\beta}_1 - \beta_1}{\mathrm{SE}(\hat{\beta}_1)} ~\sim t(n-2) } $$

Notice that the final expression only has $\hat{\beta}_1$ and $\tilde\sigma^2$ as inputs. It is called the t-statistic.

Now, to your original question. Are the t-statistics generated by the ML and OLS procedures the same. The t-statistic has only two inputs. One is an estimate of $\beta$ and the other is the estimate of $\sigma^2$. So, to know whether ML and OLS generate the same t-statistics we just have to know whether the two inputs are the same in both procedures. They are not because the ML procedure uses $n$ as the divisor when calculating $\tilde{\sigma}^2$. The OLS procedure on the other hand uses $(n-2)$. Therefore, the t-statistics generated by the two procedures will not be the same.

I hope this helps. It's pretty similar to what I deleted earlier, except that I was considering a non-regression framework ( hypothesis testing of a mean of the distribution with those assumptions ) so the formulae were slightly different. But, more importantly, the concept is exactly the same. I wanted to understand it better myself so it was useful to me.

Oh, if you haven't already, I highly recommend either sitting in or maybe there's a two-semester mathematical-statistics class on the internet you could take. You sound interested and, when you learn it from a math-stat standpoint, there's nothing to memorize because it's more of a conceptual type of thing. In fact, if you take a good math-stat class with a good teacher and good book, the desire for memorization kind of gets lost in the shuffle.

mlofton
  • 2,417
  • *They are not because the ML procedure uses $n$ as the divisor when calculating $\tilde{\sigma}^2$. The OLS procedure on the other hand uses $(n-2)$.

    Therefore, the t-statistics generated by the two procedures will not be the same.* < When using the ML procedure to estimate the standard deviation as input for the t-statistic, then one applies a correction to get an unbiased estimate. There is no rule that states that the t-statistic needs to be computed with the biased standard deviation.

    – Sextus Empiricus Jul 25 '23 at 10:51
  • Are there examples of people that use a biased t-statistic and don't scale the statistic such that it is not t-distributed? – Sextus Empiricus Jul 25 '23 at 10:58
  • mlofton: Thank you for coming back! Also, thank you for providing your version of answer. Okay, let's get to the business. It seems you mixed up $\tilde{\sigma}^2$ and $\hat{\sigma}^2$. But, let me just use your notation system, to make the discussion easier. That is, you use $\tilde{\sigma}^2$ to denote the estimated $\sigma^2$. Okay, I agree that $\tilde{\sigma}^2_{OLS} =\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{n-2}$. And you construct $V{OLS}=\frac{(n-2)\tilde{\sigma}^2_{OLS}}{\sigma^2} \sim \chi^2(n-2)$. That is correct, and I agree. (To be continued - Part 1) – Will Jul 25 '23 at 12:05
  • We know that $\tilde{\sigma}^2_{MLE} =\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{n}$. To construct MLE version of $\chi^2$, you need to write it as $V{MLE}=\frac{(n)\tilde{\sigma}^2_{MLE}}{\sigma^2} \sim \chi^2(n-2)$. (Part 2) – Will Jul 25 '23 at 12:09
  • Note that, we used $(n-2)$ in $V_{OLS}$ because $(n-2)$ in the denominator of $\tilde{\sigma}^2_{OLS} $. In a similar vein, we need to use $(n)$ in $V_{MLE}$ because $(n)$ in the denominator of $\tilde{\sigma}^2_{MLE}$ (Part 3) – Will Jul 25 '23 at 12:12
  • Thus, we can get $V_{OLS}=\frac{(n-2)\tilde{\sigma}^2_{OLS}}{\sigma^2} =\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{\sigma^2} = \frac{(n)\tilde{\sigma}^2{MLE}}{\sigma^2} = V_{MLE}$. In other words, we get the exactly same denominator for the t-statistic across OLS and MLE. Since we already know the numerator (i.e., the N(0,1)) in t-statistic is the same across OLS and MLE, we get the exactly same t-statistic across OLS and MLE eventually. (Part 4) – Will Jul 25 '23 at 12:17
  • I agree with Sextus that "There is no rule that states that the t-statistic needs to be computed with the biased standard deviation." That is, to construct the $\chi^2(n-2)$ for t-statistic, there is no rule that you have to use the biased $\tilde{\sigma}^2_{MLE}$ rather than the unbiased $\tilde{\sigma}^2_{OLS}$, or vice versa. Importantly, my 4-parts answer further shows that, regardless of biased $\tilde{\sigma}^2_{MLE}$ or unbiased $\tilde{\sigma}^2_{OLS}$, you get the exactly same $\chi^2(n-2)$, the denominator of t-statistic. – Will Jul 25 '23 at 12:30
  • When you put the $V_{OLS}=\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{\sigma^2} = V{MLE}$ into the denominator of t-statstic, $\sigma^2$ will be gone, since there is $\sigma$ in the numerator of t-statistic as well. That is, the unknown $\sigma^2$ will disappear in t-statistic. Actually It should not exist in the final calculation, since it is unknown. – Will Jul 25 '23 at 12:42
  • Hi WIll and Sextus: Let me just answer your issue with the variance ( or standard deviation ) that comes out of the ML as being biased. ( I'll read the other comments later ). The $\tilde{\sigma}^2$ being biased just a product of the ML procedure. When the likelihood is computed and the estimates are found, $n$ is in the denominator of $\tilde{\sigma}^2$ because the likelihood is such that it works out that way. Maybe whuber could explain this more clearly but it's NOT A CHOICE. The result has $n$ in the denominator. It's by definition of what ML does which is maximize the likelihood. – mlofton Jul 26 '23 at 13:50
  • Will: I used the notation $\tilde{\sigma}^2$ but I could have used $\hat{\sigma}^2$ or whatever. That was just a choice. But yes, the $\sigma^2$ gets cancelled out which is what you want since it is unknown. I'll read both of your other comments later because I have to leave now. – mlofton Jul 26 '23 at 13:52
  • Will: Note that for $n$ greater than say 30 or 40, it's really not going to make a difference whether one uses the biased estimate or the unbiased one, But note that, theoretically speaking, the same EXACT argument doesn't go through for a biased estimator because one needs $\frac{(n-2)\tilde{\sigma}^2}{\sigma^2}$ for the $\chi^2(n-2)$ result to hold. The same argument doesn't hold if one uses $n$. – mlofton Jul 26 '23 at 14:12
  • I really have to leave but note that the reason for the above not holding is because $\frac{\sum_{i=1}^n(X_{i} - \bar{X})^2}{n}$ is a biased estimator of the true unknown $\sigma^2$. By this I mean that $E\left(\frac{\sum_{i=1}^n(X_{i} - \bar{X})^2}{n}\right) \neq \sigma^2$. – mlofton Jul 26 '23 at 14:24
  • Hi Will: I read your and Sextus earlier comments so I'll try to address those. I originally took your question to mean that, if we took the MLE estimates of $\sigma^2$ and $\beta$ and uses them to construct the t-statistic, then will the t-statistic results be the same. My answer was no. Now, as I'm reading your earlier comments, it seems like you correct the MLE t-statistic so that IT IS the OLS-tstatistic. In that case, yes, they will give the same results. That's not at all what I thought your original question was. – mlofton Jul 27 '23 at 05:08
  • My confusion is, if you're just going to modify the t-statistic from the ML procedure so that IT IS the OLS- t-statistic, then why construct the MLE estimates in the first place ? What were they for ? – mlofton Jul 27 '23 at 05:11
  • Also, one correction: Using V MLE to denote the variance of the MLE is not a good choice because what you are actually constructing is : $\frac{\sum_{i=1}^{n} (X_{i} - \bar{X})^2}{\sigma^2}$ which is STILL $\chi^2_{n-2}$ not $\chi^2_{n}$. The df's stay the same. You only multiplied by $n$ to get rid of the $n$ in the denominator. The sum of the squares over the true sigma is still $\chi^2_{n-2}$ because 2 parameters were estimated. – mlofton Jul 27 '23 at 05:18
  • So, to me, in the end, you are multiplying by $n$ so that you have the same t-statistic that OLS produces. That statistic has to have the same distribution as it had in the OLS derivation because nothing changed. The t-statistic is EXACTLY what it was in the OLS case. – mlofton Jul 27 '23 at 05:21
  • Notice that, if the two parameters, $\sigma^2$ and $\mu$ were known and not estimated, then $\sum_{i=1}^{n} (X_{i} - \mu)^2/{\sigma^2} \sim \chi^{2}_n$ because it is the sum of $n$ normal squared RVs. – mlofton Jul 27 '23 at 05:37
  • Will: I just realized that multiplying by $n$ to correct the t-statistic won't work as you would like for the reason that the algebra works out so that you end up with $\sqrt{\frac{\chi^2_{n-2}}{n}}$ and what I refer to DEN. When this term is used in the denominator and the normal is used in the numerator, you will not obtain a t distribution because the term in the denominator, whether it's $n$ or $n-2$, has to be equal to the degrees of freedom of the $\chi^2$ RV. So, your attempt is interesting but the laws of the t-distribution make it not work out. – mlofton Jul 27 '23 at 06:35
  • Will:I did use bad notation in that I should have been writing $\frac{SSE}{n-2}$ whenever I wrote $\frac{(X_{i} - \bar{X})^2}{n-2}$. I can't change my comments to reflect this bad error but the same argument still holds. I used a bad choice of $X$ and I didn't subtract the correct thing. Easiest way to correct is to just call it SSE. My apologies but the argument still doesn't change. $\frac{n \times SSE}{\sigma^2}$ is still $\chi^2_{n-2}$ because 2 parameters are estimated: $\beta_{0}$ and $\beta_1$. This means that the resulting statistic does not have a t-distribution. – mlofton Jul 27 '23 at 07:09
  • @mlofton Thank you for adding so many comments. I will read through them later as I have been busy on something else in the last 2 days. I will reply to your comments later. Again, thanks! – Will Jul 27 '23 at 22:02
  • @mlofton It seems that you agree with my 4-part comment above to get to this $V_{OLS}=\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{\sigma^2} \sim \chi^2(n-2)= V{MLE}$ . If so, the DEN of t-statistic will be $\sqrt {\frac{V_{OLS}=\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{\sigma^2} \sim \chi^2(n-2)= V{MLE}}{n-2}}$ . You can see, MLE and OLS get to the exactly same DEN of t-stastic. I am not sure how you get to the $\sqrt{\frac{\chi^2_{n-2}}{n}}$. For both MLE and OLS, DEN will be the same, namely $\sqrt{\frac{\chi^2_{n-2}}{n-2}}$ . – Will Jul 28 '23 at 03:14
  • @mlofton The agreement of $V_{OLS}=\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{\sigma^2} \sim \chi^2(n-2)= V{MLE}$ is the foundation of this discussion. If you are not sure how I get this, refer to my Part 1, Part 2, Part 3, and Part 4 comments above. I have these notations Part 1, Part 2, Part 3, and Part 4 at the end of each comment. Thus, you can find them easily. – Will Jul 28 '23 at 03:19
  • Hi Will: Yes, I do agree with you that your expression, $V_{MLE} \sim \chi^2_{n-2}$. We are good there. But notice that, in order to obtain the t-distribution, one needs a standardized normal divided by $\sqrt{\frac{\chi^2_{n-2}}{(n-2)}}$. So, if you want to use $V_{MLE}$ you are also going to have to put an $(n-2)$ in the bottom of the square root expression. – mlofton Jul 28 '23 at 23:57
  • Will: Notice that we are trying to do is take the $MLE$ of $\sigma^2$ and modify the t-statistic so that it is exactly the same as the t-statistic based on OLS. You definitely can do that with the suggestion I made above which is to divide by the $(n-2)$. But I thought your original question was to just take the MLE estimates and replace the OLS estimates with them. That won't work but this discussed approach will once one makes sure to obtain the t-statistic that corresponds to the OLS t-statistic. I hope it's clear now. – mlofton Jul 29 '23 at 00:04
  • Also, somewhere in this discussion, I think you said that something was distributed as $\chi^2_{n}$. If you didn't and I made a mistake while reading your comments, my apologies. – mlofton Jul 29 '23 at 00:05
  • Will: Just one more thing to close it out unless you have more questions. In the end, by putting the (n-2) in the denominator, we are really just "fixing" the resulting statistic so that it's still the OLS- t statistic. So, you could say that the MLE estimate can create the same t-statistic but it's only because we are algebraically manipulating it so that it's still the OLS t-statistic. So, I don't see what the point of doing that is but atleast you can see that it is possible to do it if you want. – mlofton Jul 29 '23 at 02:07
  • Will: I promise that this is my last comment on this but I think it clarifies what I'm trying to say nicely. You explained how you $V_{MLE}$ has to be multiplied by $n$ in order to get the $\chi^2_{n-2}$ random variable. I then explained how you're still going to need an ${n-2}$ term in the denominator because, in order to obtain the t-distribution, one needs to divide the normal rv by : the square root of a $\frac{\chi^2_{n-2}}{(n-2)}$. But $\frac{n}{n-2}$ is EXACTLY what one would multiply by if they had the $MLE$ of $\sigma^2$ and wanted to create the unbiased estimate of $\sigma^2$. – mlofton Jul 30 '23 at 09:44
  • So, in the end, if one has the MLE of $\sigma^2$ and wants to creates the correct t-statistic, one needs to multiply it the factor $\frac{n}{n-2}$ which is really just taking the MLE estimate of $\sigma^2$ and constructng the unbiased OLS-estimate. So, we came all the way back around to where we started !!!!! I hope that helps. – mlofton Jul 30 '23 at 09:48
  • @mlofton Thank you for providing so many comments. You are a good person! But, I am still lost with you saying the difference between $n$ and $n-2$. Are you in the end concluding that the t-statistic is exactly the same between OLS and MLE? In other words, are you agreeing with me? If so, we can close this discussion. If not, we need to comment further. – Will Aug 01 '23 at 20:16
  • BTW: I wrote this above $denominator-of-t-statistic =SE= \sqrt {\frac{V_{OLS}=\frac{\sum_{i=1}^n (y_i - \hat{\beta}0-\hat{\beta}_1 x_i)^2}{\sigma^2} \sim \chi^2(n-2)= V{MLE}}{n-2}}$. Thus, the denominator of t-statistic for OLS and MLE is exactly the same. (see, both with $n-2$). – Will Aug 01 '23 at 20:21
  • Hi Will: I'll try to print this out ( my printer is very fickle ) because I can't see it on the screen. If I can't, maybe I can try to write up what I'm trying to say with equations. But, for now, I'd rather try to understand ( by seeing ) your equation more clearly. I'm truly not sure what the answer is to your question. Sometimes I think it's just semantics because these things were given names like t-statistics without really ever differentiating between OLS test-statistic and MLE test-statistic. Everyone just called it a t-statistic so that might be what the problem is. – mlofton Aug 02 '23 at 08:43
  • I think what I said above is true. I've been out of academics since 2000 so, I could be dating myself. But the terminology of OLS test-statistic and ML test-statistic is kind of foreign to me. I just remember test-statistic. Things could have changed with age is all I'll say for now. Others following this discussion are welcomed to step in because I'm A) pretty sure my side could be explained more clearly and B) I'm not 100 percent that I understand what Will is asking in his original question. Will: I'll be in touch by the latest one day from now. – mlofton Aug 02 '23 at 08:50
  • There's no way to modify old comments on this right ? Sometimes, when you go back to read what you said, you'd like to change something that you commented before so that it makes more sense now. Or maybe you need to have a ton of points to be able to do that. I never read how all these things work. Anyway, is that possible because I was thinking of going back to when this started and see what I said way back when. I hope that I'm not saying gibberish. You must know that saying: "if you can't explain it, then you don't understand it".I'm getting a little nervous. – mlofton Aug 02 '23 at 08:54
  • It makes me think a different part of my brain which I don't normally use which is fun. So your question is appreciated. I'd love to close it also though. But," we've come this far" is what I always say. That's not to say that living by that motto has always led to great places, though. – mlofton Aug 02 '23 at 08:57
  • I think we agree about everything upto and including part 4). We can call the thing you keep writing out in the numerator the sum of squared errors which is abbreviated by $SSE$. Both of us can calculate that. You have to multiply by $n$ and I have to multiply by $n-2$. But who cares ? We can both get to that same squared sum and $/frac{SSE}{\sigma^2} \sim chi^{2}_(n-2)$. The goal is to create a t-statistic. So, the $\sigma$ cancels out and we are left with your construction : SSE. Now we need to create the denominator of the t-statistic. – mlofton Aug 02 '23 at 09:55
  • So, we look up at $\chi^2_{n-2}$ where $n$ is some number and we get the $\chi^2_{n-2}$ and then we DIVIDE THAT BY (n-2). Then, we take that result and put that under a square root. That's our denominator. But you see I have that divide by that in caps which wasn't mean to be rude. It's just to emphasize it. You HAVE TO HAVE an $(n-2)$ in the bottom. Why ? Because we need to have a $\chi^2_{n-2}$ in the top of DEN. We know that we have SSE as $\sim \chi^{2}_{n-2}$ because the squared sums ( SSE ) are distributed that way. – mlofton Aug 02 '23 at 10:07
  • It's best to look at your very first picture where $V$ is defined. Notice that the $SSE$ multiplied by $(n-2)$ is distributed as $\chi^2_{n-2}$. That's just a fact. Fortunately, that thing needed in numerator is exactly distributed that way. So, we need to create a "thing" so that, when it's multiplied by (n-2) ( not n ), we obtain the $SSE$. The way to do that is to take the SSE and divide it $(n-2)$ first. This way, when it gets multiplied later by $(n-2)$, the sums of squares comes out. We need the sums of squares to come out because thats what happens to be $\chi^{2}_{n-2}$. – mlofton Aug 02 '23 at 10:24
  • but, if the SSE needs to be multiplied by (n-2) to reach its sums of squares, then its sums of squares need to be divided by (n-2) so that the multiplication cancels it out. – mlofton Aug 02 '23 at 10:39
  • So we have: $\frac{(n-2) \times SSE}{(n-2)} $. But looking at the formula for what gets one to a $\chi^2_{n-2}$, that implies that $(n-2)\tilde\sigma = \frac{(n-2) \times SSE}{(n-2)}$ Therefore, $\frac{SSE}{(n-2)} = \tilde{\sigma}^2$. This is the unbiased OLS estimator. – mlofton Aug 02 '23 at 11:01
  • When I say a "thing" a couple of comments up, I'm really referring to an estimate of $\sigma^2$. I could have put a tilde or a hat or whatever. It's an estimate of the true unknown $\sigma^2$.. Also, in the immediate line above, it should be $(n-2)\tilde{\sigma}^2$. – mlofton Aug 02 '23 at 11:13
  • I think look at this way: One knows that $\frac{SSE}{\sigma^2} \sim \chi^2_{n-2}$ and one wants it such that $\frac{(n-2) \tilde{\sigma}^2}{\sigma^2} \sim \chi^2_{n-2}$. What does $\tilde{\sigma}^2$ have to be to achieve that. – mlofton Aug 02 '23 at 11:29
  • @mlofton okay, I will get back to you later. I am working on something else. – Will Aug 02 '23 at 14:13
  • No rush. I hope it helps. All I'm trying to show above is that, since you need the $(n-2)$ in the $\chi^2_{n-2}$ expression involving the ratio of estimated variance to the true one, that shows that the OLS estimate for the estimated variance is being used. – mlofton Aug 04 '23 at 03:21