5

As a rule of thumb, the minimum suggested sample size for a linear model is at least 10 per parameter included in the model (Bolket et al., 2009). I know that in GAMs, different smooth terms can have different effective degrees of freedom (edf) and different number of basis functions depending on the nature of the relationship, which in turn may determine the amount of data that will be needed.

My question is : How can I be sure that I have enough data points to build a GAM or GAMM with certain complexity? Is there a general rule to find the minimum required sample size for a GAM or GAMM?

Reference:

Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J.-S. (2009). Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology & Evolution, 24, 127–135. https://doi.org/10.1016/j.tree.2008.10.008

KO 88
  • 341

1 Answers1

7

A challenge of power analysis with GAMs is that what your variable of interest can vary greatly depending on your problem. Significances of individual smooth basis coefficients are usually not relevant. So one needs to define the power analysis specifically, e.g., "Is there enough data to determine if the partial effect of X variable has a unimodal shape rather than a flat or increasing/decreasing pattern?" Once you have such a question well-defined, the best way to proceed is a simulation approach where one generates multiple sets of artificial data representing your expected outcomes and fits against that data. A good guide to this approach is:

Kain MP, Bolker BM, McCoy MW. 2015. A practical guide and power analysis for GLMMs: detecting among treatment variation in random effects. PeerJ 3:e1226 https://doi.org/10.7717/peerj.1226

Noam Ross
  • 303