5

I'm analyzing an experiment where the strength of the treatment varied by treatment day in an uncontrolled way. Specifically, we tested the response time of 11 non-overlapping groups honey bees on 4+ days, each. On any given day, only one group was tested under two experimental conditions: first unfed, then fed. How much a group of bees fed was out of our hands and varied considerably, hence the variable treatment strength.

My lmer() model currently implicitly includes treatment strength via the random effect (1 | group_id/day):

response_time ~ experimental_condition + 
(1 | group_id/subject_id) + (1 | group_id/day)

I'm interested in estimating the effect of experimental_condition on response_time and am wondering whether this model could be improved by incorporating how much a group fed per day? If so, how would I do that, and how can I handle the small number of days on which we were unable to measure how much the bees had consumed?

Tim
  • 115
  • Why were you unable to unable to measure how much the bees fed ? What proportion of days did that happen ? – Robert Long Nov 22 '23 at 09:29
  • On rainy days, rainwater accumulated in the feeding apparatus and mixed with the food. This made it impossible measure the weight of the food removed by the bees. Human error (e.g. someone dropping the feeder before we could measure its weight) also played a role. We have no measurements for 8/78 days. – Tim Nov 22 '23 at 10:27
  • Would you say that bees had ad libitum access to food ? – CaroZ Nov 22 '23 at 10:53
  • 1
    The bees that foraged on the feeder were allowed to collect as much food as they wanted during the 3 hours the feeder was present. The bees that did not forage were able to obtain food from the honeycomb inside the hive or from the foragers. – Tim Nov 22 '23 at 12:01
  • Do you also have data for the weather conditions ? – Robert Long Nov 22 '23 at 13:19
  • Yes. I know on which of the 8 days we have no measurement due to rain and on which of these days the measurement is missing due to human error. – Tim Nov 22 '23 at 13:35
  • OK that's good. Can you just explain why your model "currently implicitly includes treatment strength via the random effect (1 | group_id/day)". Is day really nested within group_id ? – Robert Long Nov 22 '23 at 13:38
  • I nested day within group_id because the weather (temperature, wind speed, cloud cover, ...) has a strong effect on bee behavior, and we tested only one group (bee colony) per day. In addition, each colony began to feed itself on a different day of the experiment, and the amount of food they collected per day increased in colony-specific ways with day. The latter is also why I said that the random effect (1 | group_id/day) currently implicitly includes treatment strength. – Tim Nov 22 '23 at 14:19
  • I'm a bit puzzled by the day variable. If it's nested in group that means each day "belongs" to 1 and only 1 group, and there are multiple days "belonging" to each group. Were no other groups tested on the same day ? If they were then they should be crossed, not nested random effects. See here for details of crossed vs nested, random effects.: Crossed vs nested random effects. Also, it looks like you have duplicated random intercepts for group. – Robert Long Nov 22 '23 at 21:33

1 Answers1

2

From the description, I don't see any problem with adding the variable treatment strength as a fixed effect. However, I think there are a few other issues here.

- Duplication of random intercepts:

Your random structure is:

(1 | group_id/subject_id) + (1 | group_id/day)

which expands to

(1 | group_id) + (1 | group_id:subject_id) + (1 | group_id) + (1 | group_id:day)

so the random intercepts for group_id are duplicated. This can cause problems with convergence. A more appropriate structure is:

(1 | group_id) + (1 | group_id:subject_id) + (1 | group_id:day)

- How to handle missing data:

For the days with missing treatment strength data, you could consider multiple imputation. It seems plausible that the missingness is NMAR (not missing at random, which is a problem for multiple imputation), particularly if the conditions that led to the missing data (like rain) are believed to directly affect the feeding behaviour or the amount of food consumed by the bees, which is the unobserved data. However, if you have recorded data on conditions like weather, and these conditions fully explain the missingness, then it can be considered MAR, in which case multiple imputation should be

- Consider interaction between treatment strength and experimental_condition:

You might want to consider an interaction term between experimental_condition and treatment_strength. This would allow the model to differentiate the effect of feeding amount based on the experimental condition (fed vs unfed).

- Treatment of the day variable`:

I'm a little bit puzzled by the day variable. You seem to be sure that it's nested within group, but that means that each day "belongs" to one and only one group, there are multiple days "belonging" to each group, and no other groups were tested on the same day. If that's not the case then they should be crossed random effects, not nested.

Robert Long
  • 60,630
  • 1
    Thanks for pointing out the duplicated random intercept. I indeed had convergence problems and was wondering were they are coming from. Specifying the random effects structure like you suggested fixed this and also eliminated the odd extra term in the summary of the fit. – Tim Nov 23 '23 at 09:08
  • I should be be able to use multiple imputation for the days on which the feeder weight measurements are missing due to human error. On the days it rained, I probably won't be able to do that because I have no information on how long or how heavily it rained. Do you think lmer() will be able to handle me simply deleting all data from the rainy days? – Tim Nov 23 '23 at 09:08
  • I'm fairly new to linear models and don't yet fully understand how interaction terms work. I'll look into it! – Tim Nov 23 '23 at 09:09
  • In this experiment, day does indeed belong to one and only one group_id. I have another experiment where the groups overlapped temporally (i.e., on some days, more than one group was tested), and for that experiment I'll use (1 | day). – Tim Nov 23 '23 at 09:09
  • lmer can handle deleting rows, but that is known to often create bias (it's call complete case analysis if you didn't already know) . I would still go ahead with MI, but perform sensitivity analysis by also running the complete cases analysis. - perhaps you could find some local weather dataset and incorporate that into your analysis ? You have quite a small fraction of missingness so it might not be a problem. As for day and group, your last comment makes sense :) – Robert Long Nov 23 '23 at 09:27
  • Just to make sure I understood your suggestion at the top of your answer correctly, would the model with the added fixed effect for treatment strength be response_time ~ experimental_condition + treatment_strength + (1 | group_id) + (1 | group_id:subject_id) + (1 | group_id:day), where treatment_strength is how much food group_id consumed during day (e.g. 152 ml)? – Tim Nov 24 '23 at 12:13
  • Yes, that is what I would do. – Robert Long Nov 24 '23 at 12:23
  • Great! One final question: could I include treatment_strength as a random effect (1 | treatment_strength) instead of a fixed effect? I'm wondering whether this might be more appropriate as I had no control over treatment_strength. Plus, from a scientific perspective, I currently have no interest in this variable. I'm mainly including it to (hopefully) help lmer() better fit the data. – Tim Nov 24 '23 at 12:56
  • For treatment_strength to be random effect it would have to be a factor/categorical variable. While it's true that "nuisance" factors can be modelled as random, in this case it's probably better for it to be a covariate (fixed effect). – Robert Long Nov 24 '23 at 13:11
  • Then I'll add it as a fixed effect. Thanks for your help, Robert! :) – Tim Nov 24 '23 at 13:38
  • Sorry for reviving an old discussion, but I'm unsure whether I've coded treatment_strength properly. My daily treatment strength measurement is on the group_id level, so currently treatment_strength has the same value for all group_id/subject_id on a given day. However, during the unfed experimental_condition we obviously didn't treat, so now I'm wondering whether treatment_strength should actually be 0 or NA for any response_time measurements made during that time. What do you think? – Tim Nov 30 '23 at 09:23
  • No worries @Tim, If I've understood the issue, then I'd say it should be 0, not missing (NA). – Robert Long Dec 01 '23 at 15:21