I have 290 clusters made up by subjects observed from one to 14 times, and a relative large number (45) are observed only once. I worked in R, and I considered subjects ID as random effect. The first model I fitted is made up of fixed effects (I have 3 $\beta_s$, that is the intercept and 2 regression coefficients) and 1 random intercept. Since I'd make model comparisons based on AIC, by fitting also more complex models with random slope, I estimate it by ML. Is that okay? I do not think ML is a problem since the sample size is large (more than 2000 observations) and the bias in the variance estimation should be negligible (indeed, it is).
The choice of going with a more complex model, I mean with a random slope as well, shall be first driven by theory, right? Or shall I consider a priori a more complex model in light of the (peraphs) higher variance that an additional random effect could explain. Indeed, this is the case. Considering the random intercept as the baseline simplest model, if I fit a model with both random slope and random intercept, it turns out that the variance explained by the random slope is 4 times the residual variance, and twice the variance explained by the random intercept. Indeed, this result occurs only by fitting a random intercept related to one specific covariate out of two. For the other covariate this does not occur, and the variance explained by the random slope is relatively tiny with respect to the variance of the residuals and the variance of the random intercept. Actually, this model (only the one where the variance of the random slope is relatively tiny)cannot be estimated with ML since it fails to converge, and I'm forced to use REML. Is this approach for model selection valid ?