1

I wonder if the following mixed model suffers of overfitting

Note: The following is just an hipotethical model to examplify the model construction.

Research question: what affect apple production?

Experiment design: I have 100 indipendent sites where I grow 1 apple tree x site. At each site I measured every year a variable (e.g. yearly number of apples produced by the tree). Time series length (number of years with data) varies from site to site ranging from 2 to 10 years. If I pull together all data series I have 500 yearly observations.

Mixed model:

  • the response variable is the "yearly number of apple"
  • the random factor is the "site"
  • the predictors are 10 (e.g. age of the tree; yearly precipitation...)

If I divide yearly observations (i.e. 500) by fixed factors (i.e. 10) the ratio is 50, consequently the model satisfies the rule of 10-15 observations for each predictor to avoid overfitting.

However, if I divide yearly observations (i.e. 500) by fixed (i.e. 10) and random (i.e. 100) factors the result is 0.5, that is too few observations to fit a model with 10 predictors. Clear overfitting.

Which is the way I should look for overfitting in a similar mixed model?

Dave
  • 11
  • 1

0 Answers0