Is AIC appropriate for model selection when the parameters are fitted by least-squares rather than MLE

Question

I want to compare the fit of a linear model (M1) and nonlinear model (M2):

M1: $y = b_0 + b_1x_1 + b_2x_2 + b_3x_1x_2 + \epsilon, \epsilon \sim N(0, \sigma^2)$
M2: $y = b_0 + b_1x_1 + b_2x_2 + b_1 b_2x_1x_2 + \epsilon, \epsilon \sim N(0, \sigma^2)$

In particular I want to know whether M1 is significantly different from M2.

To estimate the parameters, I am minimizing the least-squares errors rather than maximizing the likelihood through MLE procedures. In particular, I am using the R function nls() as follow:

# Creating a sample data set
n <- 50
x1 <- rnorm(n, 1:n, 0.5)
x2 <- rnorm(n, 1:n, 0.5) 
b0 <- 1
b1 <- 0.5
b2 <- 0.2
y <- b0 + b1*x1 + b2*x2 + b1*b2*x1*x2 + rnorm(n,0,0.1)
# Actual model fit
M1 <- nls(y ~ b0 + b1*x1 + b2*x2 + b3*x1*x2, start=list(b0=1, b1=0.5, b2=0.5, b3=0.5))
M2 <- nls(y ~ b0 + b1*x1 + b2*x2 + b1*b2*x1*x2, start=list(b0=1, b1=0.5, b2=0.5))

I want to compare the models using a measure of relative fit such as AIC, which can be done in R as follow:

AIC(M1, M2)
   df       AIC
M1  5 -88.47849
M2  4 -90.46491

Because $\Delta AIC \approx 2$ and the models differ by only one parameter, I would conclude that both of them fit the data similarly well.

In addition, I want to know whether the parameter $b_3$ from M1 significantly add to the fit using a statistical test such as an F-test. This can be done in R as follow:

anova(M1, M2)
Analysis of Variance Table

Model 1: y ~ b0 + b1 * x1 + b2 * x2 + b3 * x1 * x2
Model 2: y ~ b0 + b1 * x1 + b2 * x2 + b1 * b2 * x1 * x2
  Res.Df Res.Sum Sq Df      Sum Sq F value Pr(>F)
1     46    0.40843                              
2     47    0.40855 -1 -0.00011097  0.0125 0.9115

My general question is:

Are these analyses appropriate?

More specifically:

Can I use AIC to compare least-squares fitted models?

From a few posts such as this one it looks like AIC should be appropriate. However, I've seen posts such as this one that indicates that using AIC on non-MLE fitted models might be a problem. I understand that least-squares is equivalent to MLE if the error is normally distributed, but is this true even for non-linear models?

Can I use a F-test to test whether $b_3$ is significantly different from $b_1 b_2$?

I know such F-test makes sense if the model are nested, but I'm unsure whether it is appropriate in this case.

Maybe I am missing something here, but I believe the NLS package uses a model where the two things are the same. I.e. the likelihood is maximized when the sum of squared differences is minimized. In that case it does not matter that a least squares estimate is used rather than MLE, the results are the same. (This also holds true when using OLS on a model on data ~ N(0, sigma^2). — Fraijo, Nov 04 '13 at 18:52
This is a big part of my question. Is minimizing least-squares always the same as maximizing the likelihood when the error is normal? Or is it specific to linear models? From your comment @Fraijo, I understand that it's always the same. Am I correct? If that's the case, can I use an F-test to look at the significance of $b_3$? — Marie Auger-Methe, Nov 05 '13 at 09:32
Note that your M1 can be estimated as a linear model. So, a command or function for that should show the same answers as does nls. — Nick Cox, Nov 05 '13 at 14:06
@NickCox I agree M1 is linear, but M2 is not. The fact that I have to compare a linear (M1) to a non-linear model (M2) is the reason for the question. I am using nls for M1 just to be consistent with M2. — Marie Auger-Methe, Nov 05 '13 at 14:20
Sure; but you seem very uncertain about what nls() does and implies and part of the answer for you to learn that is to see that the results of two procedures should be consistent, if not identical. — Nick Cox, Nov 05 '13 at 14:23
I do not know of a general theorem that shows equivalence between least squares and MLE for any regression assuming Gaussian errors. I am not sure about using an F-test. In the regression setting that assumes the models are nested, which in this case they are not. — Fraijo, Nov 05 '13 at 20:11

score 1 · Answer 1 · answered Nov 05 '13 at 14:21

This may not the answer you seek, but in general

The first things to check are how close $b_3$ in M1 is to $b_1 b_2$ in M2 and whether predicted values match each other. AIC and F-tests tell you how well each model fits, but they say nothing about how the models differ. Simple numeric and graphical comparisons may tell you more.
In M1 the value of $b_3$ is unconstrained and in M2 it is constrained. If the criterion is closeness of fit to the data in some absolute sense, then it would be surprising if a constrained fit were better. Otherwise the comparison will hinge on precisely whether and how you penalise yourself for using one more parameter in M1. So, watch out: you won't get anything out of AIC or similar or dissimilar figures of merit that is not a strict consequence of how they are defined. That no doubt is obvious, but it is important.

You are absolutely right. The idea here is that M2 is a biologically relevant null model for M1. As the results show, the AIC of M1 should be greater than the AIC of M2 by about 2 (the penalization for the extra parameter) when there is no interaction other than $b_1 b_2 x_1 x_2$. The reason I want to use AIC and a F-test rather than graphical means is that I want to be able to quantify whether $b_3$ is a significantly different parameter than $b_1 b_2$. — Marie Auger-Methe, Nov 05 '13 at 14:34
One additional point. In the case where the data is created with a $b_3$ different from $b_1 b_2$ the AIC difference would be much larger and hopefully the F-test would be significant. — Marie Auger-Methe, Nov 05 '13 at 14:43

Is AIC appropriate for model selection when the parameters are fitted by least-squares rather than MLE

1 Answers1