MLE of Linear Regression with heteroskedasticity

Question

Assume a linear regression model $y = X \theta^{*} + \epsilon$, where $X$ represents a feature matrix and $\theta$ represents a parameter vector.

Here we assume heteroskedasticity where $\epsilon \sim N(0, \Sigma^{*})$.

I also assume that $\Sigma^{*}$ is diagonal.

So far I got that the log-likelihood function can be simplified to something like this (when considering the argmax with respect to $\theta$): $$ LL = \sum_{i=1}^n \left( - \frac{1}{2}\ln(2\pi\sigma_{i}^{2}) - \frac{(y_i-(x_i \theta))^2}{2\sigma_{i}^2} \right)$$

... where $x_i$ represents the feature vector of observation $i$.

But if I try to take the derivative with respect to $\theta$, and set it to 0, I'm not able to come up with a closed form expression of the MLE $\hat{\theta}$, as I can't seem to separate out the $\sigma_{i}$'s.

Am I missing something obvious / is it even possible? And also, how much would things change if I do not assume $\Sigma^{*}$ to be diagonal?

Update:

Following advice in the comments, and assuming $\Sigma$ is diagonal, I've concluded that

$$\hat{\theta} = (X^T \Sigma^{-1} X)^{-1} X^T \Sigma^{-1} y$$

and that

$$\hat{\Sigma} = Diag(\hat{\sigma^2_1}, ..., \hat{\sigma^2_N})$$

where

$$\hat{\sigma^2_i} = (y_i - x_i^T \hat{\theta})^2$$

which may or may not be correct. But if they are roughly correct, then the MLE of $\theta$ and MLE of $\Sigma$ seems to depend on each other, and in practice, let's say we're given some data of $y$ and $X$, how would we actually go about using these formulas?

According to these notes, "estimating" the weights, or the error variances, seems to actually be one of the harder steps in practice. Then I guess there isn't much point of me working out the MLE of $\hat{\Sigma}$?

Relevant references: Rutemiller & Bowers (1968), and Lahiri & Egy (1981) — Durden, Feb 13 '24 at 18:00
Is this a univariate regression model without an intercept? Or is $x_i\theta$ supposed to denote an inner product between vectors? — John Madden, Feb 13 '24 at 18:14
@JohnMadden Oh my bad. They are inner products between vectors. I've edited original question in an attempt to clarify that. — basementGenius, Feb 13 '24 at 20:13
It's a bold student who tries to find the MLE of a vector-valued quantity without matrix notation, heteroscedastic or not. Are you comfortable with matrices? — John Madden, Feb 13 '24 at 20:14
@JohnMadden Yes - I'm a bit less sure how to get those notations onto this web UI but I'm mostly okay with matrices. No expert but for what it's worth, I'm able to fully understand the steps taken to solve for the MLE in the case without heteroscedasticity, done in matrix operations. — basementGenius, Feb 13 '24 at 20:24
this blog post appears to cover it: https://gregorygundersen.com/blog/2022/08/09/weighted-ols/ (haven't checked the details myself) — John Madden, Feb 13 '24 at 20:50
@JohnMadden That seems relevant (but different from an MLE). Anyhow, let me see if I can find any inspirations from that blog post. Thank you! — basementGenius, Feb 13 '24 at 21:06
oh MLE is equivalent to weighted least squares, I should have mentioned. Maybe you should start by convincing yourself of this, and then look at the WLS derivations. — John Madden, Feb 13 '24 at 21:07
@JohnMadden Ah yeah, equation (17) there does look similar to the relevant part of the LL I have in the question, maybe with the weights replaced by something like (1/sigma)^2. I'll take a closer look once I get home. Thanks again! — basementGenius, Feb 13 '24 at 21:12
Hi basementGenius. If it's diagonal, then as John Madden said, you can use weighted least squares because the two are equivalent. If it's not diagonal but the matrix is known, then you can use what is sometimes referred to as the "Aitken" estimator. So, check that out. — mlofton, Feb 14 '24 at 01:58
@mlofton Got it, that's definitely good to know! For now if I just assume it's diagonal, I'm still confused on how to actually utilize these derived MLEs. I've updated the question with my confusion - do you have any thoughts or would you be able to point me to some relevant resources? — basementGenius, Feb 14 '24 at 03:43
For estimation of weights see https://stats.stackexchange.com/questions/495811/are-there-better-approaches-than-the-weighted-mean/495845#495845 — kjetil b halvorsen, Feb 15 '24 at 01:23
@basementGenius: I only received your follow up now. I can tell by what you wrote that, you can understand this with the right resources. Can you get James Davidson's "Introduction to Economic Theory" from somewhere. Section 8.5, explains weighted least squares quite nicely. Other econometric books will also have good discussions but most of mine are in storage so I recommend that one because it's not in storage and it's good. — mlofton, Feb 28 '24 at 16:16

MLE of Linear Regression with heteroskedasticity

0 Answers0

Linked