2

I am analyzing data on polar bears and trying to figure out if different variables influence their movement. My data has a mix of categorical (e.g. bear ID number) and numerical variables (e.g. bear age)   For my analysis, I was thinking of doing a model in a format like this:   Movement = x1*(year) + x2*(length of ice season) + x3*(age of bear) + bear’s individual ID + etc.   I am stuck between two options:

1.    Doing a GLMM (Generalized Linear Mixed Model). Since I’m pretty sure my independent variables don’t all have a linear relationship to my dependent variable, I was thinking of doing a quick visual analysis of my variables  and tweaking them accordingly: for example, if it looks like age of the bear has more of an exponential relationship with my movement variable, then I would write it in the model as x3*log(age of bear).
  2.  Doing a GAM (Generalized Additive Model). I’m not too familiar with this type of model, but I have heard that it’s usually the way to go if you believe the relationship between your variables isn’t necessarily linear.   In both cases, I am planning on including the bear ID as a random effect.   Which test would you recommend? Are there pros and cons to each? As an aside, my data also has relatively small sample sizes (30 to 45 bears).

Cam
  • 151
  • How does your movement variable look like? What does it represent and how is it measured? – Stefan Apr 13 '22 at 18:28
  • It would be one of three variables: either average speed per individual in km/h, path straightness index (a number without a unit) or home range size (km^2). Each variable would have its own model. – Cam Apr 13 '22 at 20:05
  • 1
    Just to make it fun, there are also GAMMs. – Ben Bolker Apr 13 '22 at 22:37
  • How many samples per bear? How many "etc." do you have and what kind? How many years? Presumably each year is associated with a single "length of ice season" value (unless e.g. you are measuring bears in different regions with different ice seasons?) Where would you say this analysis sits on the continuum between 'exploratory' (try anything, don't worry about whether the p-values are reliable or not) and 'confirmatory' (you really want to take the p-values seriously)? Do you have any a priori hypotheses? – Ben Bolker Apr 13 '22 at 23:54
  • So what I'm trying to figure out is actually whether sex and age influences the movement variables I stated in my previous reply. Because of this, it's definitely more confirmatory than exploratory. No a-priori hypothesis. Each year is associated with a single ice season. I divided the ice season into three sub-seasons (early ice season, middle ice season and late ice season) for ecological purposes, so each individual bear has data for 1 to 3 sub seasons for a given year. I am only studying bears from one region. – Cam Apr 14 '22 at 01:43
  • So in summary, sex and age are the variables I'm focusing on. I'm adding the other variables to see whether sex and age are stronger predictors than those other variables. – Cam Apr 14 '22 at 01:49

1 Answers1

1

Those aren't exclusive options. GL(M)M and GAM do different things despite the apparent similarities in names.

The choice of a GLM (generalized linear model) depends on the nature of the response variable and how you think the response is related to the linear predictor from the regression. For example, you would use a GLM for count or categorical outcomes, or if you think that the mean value of the response is linked to the logarithm of the entire linear predictor. If you want to model individual bears as random effects then you have a mixed model, potentially a GLMM (generalized linear mixed model).

A GAM (generalized additive model) is one way among others to structure the predictors to allow for flexibly modeled nonlinearities between predictors and outcomes. It sounds like flexibly modeled continuous predictors will be important for your application, so use of GAM or regression splines is worth investigating. You can combine a GAM or regression splines with a GL(M)M.

If your movement outcome is continuous and is expected to link directly to the linear predictor after the nonlinearities in the outcome-predictor associations are handled with a GAM or regression splines, then you don't need to use a generalized model. You would use a mixed model if you choose to treat the bears as random effects. That's not the only way to handle repeated measurements over time on individuals, however. Chapter 7 of Frank Harrell's course notes and book outlines the pros and cons of several approaches. He shows in detail how to use generalized least squares as an alternative that might work for your data.

Other portions of the Harrell references should help inform the best way to model your data.

EdM
  • 92,183
  • 10
  • 92
  • 267