I know there is an analytic solution to the following problem (OLS). Since I try to learn and understand the principles and basics of MLE, I implemented the fisher scoring algorithm for a simple linear regression model.
$$ y = X\beta + \epsilon\\ \epsilon\sim N(0,\sigma^2) $$
The loglikelihood for $\sigma^2$ and $\beta$ is given by: $$ -\frac{N}{2}\ln(2\pi)-\frac{N}{2}\ln(\sigma^2)-\frac{1}{2\sigma^{2}}(y-X\beta)^{'}(y-X\beta) $$
To compute the score function $S(\theta)$, where $\theta$ is the vector of parameters $(\beta,\sigma^{2})^{'}$, I take the first partial derivatives with respect to $\beta$ and $\sigma^{2}$: \begin{align} \frac{\partial L}{\partial \beta} &= \frac{1}{\sigma^{2}}(y-X\beta)^{'}X \\[5pt] \frac{\partial L}{\partial \sigma^2} &= -\frac{N}{\sigma^{2}}+\frac{1}{2\sigma^{4}}(y-X\beta)^{'}(y-X\beta) \end{align}
Then the Fisher scoring algorithm is implemented as: $$ \theta_{j+1} = \theta_{j} - (S(\theta_{j})S(\theta_{j})^{'})S(\theta_{j}) $$
Please note, the following code is a very naive implementation (no stopping rule, etc.)
library(MASS)
x <- matrix(rnorm(1000), ncol = 2)
y <- 2 + x %*% c(1,3) + rnorm(500)
fisher.scoring <- function(y, x, start = runif(ncol(x)+1)){
n <- nrow(x)
p <- ncol(x)
theta <- start
score <- rep(0, p+1)
for (i in 1:1e5){
# betas
score[1:p] <- (1/theta[p+1]) * t((y - x%*%theta[1:p])) %*% x
# sigma
score[p+1] <- -(n/theta[p+1]) + (1/2*theta[p+1]^2) * crossprod(y - x %*% theta[1:p])
# new
theta <- theta - MASS::ginv(tcrossprod(score)) %*% score
}
return(theta)
}
# Gives the correct result
lm.fit(cbind(1,x), y)$coefficients
# Does not give the correct result
fisher.scoring(y, cbind(1,x))
# Even if you start with the correct values
fisher.scoring(y, cbind(1,x), start=c(2,1,3,1))
My Question
What did I miss? Where is my mistake?
