0

I'm new to statistics and model and currently struggling to understand how to answer the following example question.

I'm trying to find out if the abundance of snails on a particular food source is a result of a preference for that food source, or the prevalence. Or more simply, do snails display a significant preference for one food source or another?

Is the number of snails found on lettuce vs carrots vs broccoli significantly different based on the relative abundance of lettuce or not? Do I find more snail on lettuce because there is more lettuce, or because snails prefer it?

Given the (example) data:

Site Snails Food Food_Percentage_Abundance
A 4 lettuce 10%
A 48 carrot 80%
A 12 broccoli 4%
B 34 lettuce 10%
C 5 lettuce 13%

What would be the best way to model this in R?

How would I then test the interaction between abundance and Site as well (to demonstrate / test that is is a general pattern across sites)?

If I try to fit this formula to a gam model e.g.

gam(SNAILS ~ 0 + FOOD + FOOD:FOOD_PERCENT_ABUNDANCE)

I get the following summary:

                                    Estimate Std. Error t value Pr(>|t|)    
FOODother                            1.64706    0.59263   2.779 0.049858 *  
FOODbroccoli                        15.60870    2.88580   5.409 0.005659 ** 
FOODcarrots                         -1.13942    1.13841  -1.001 0.373520    
FOODlettuice                        23.96296    2.54035   9.433 0.000704 ***
FOODother:FOOD_PERCENT_ABUNDANCE    -0.17647    0.54867  -0.322 0.763836    
FOODbroccoli:FOOD_PERCENT_ABUNDANCE -0.28986    0.09464  -3.063 0.037564 *  
FOODcarrots:FOOD_PERCENT_ABUNDANCE   0.24038    0.03140   7.656 0.001564 ** 
FOODlettuice:FOOD_PERCENT_ABUNDANCE  0.74074    0.12093   6.125 0.003599 ** 

To confirm I'm reading this correctly...

For any given site (as SITE isn't included in the formula) Lettuice is the Food type that has the most significant impact on the number of Snails observed, however when accounting for the Average Percent Abundance Carrots has a greater effect (due to the lower P value)?

Finally, if I wanted to model the impact of SITE as well, then I would use the formula:

SNAILS ~ 0 + SITE:FOOD:FOOD_PERCENT_ABUNDANCE

To model the interaction between all 3 variables, i.e. to see if one food source has a greater impact at Site A vs B

EdM
  • 92,183
  • 10
  • 92
  • 267
Jal
  • 1
  • 1
    Hi! What do you mean by ‘based on the relative abundance of lettuice’ ? Do you mean perhaps differences in terms of snails conditionally on abundance of lettuce? – utobi Jan 11 '23 at 20:14
  • 1
    Please register &/or merge your accounts (you can find information on how to do this in the My Account section of our [help]), then you will be able to edit & comment on your own question. – kjetil b halvorsen Jan 12 '23 at 13:19
  • Do you mean perhaps differences in terms of snails conditionally on abundance of lettuce? Yes - I think that is may be what I mean. I'm trying to find out if the abundance of snails on a particular food source is a result of a preference for that food source, or the prevelance. so, do I find more snail on lettuice because there is more lettuice, or because snails prefer it. – jal Jan 12 '23 at 11:18
  • I've edited your question to add the information that you tried to present in now-deleted answers. Please do try to merge your accounts, as that makes everything simpler for everyone once you've accomplished that task. – EdM Jan 15 '23 at 19:48
  • Based on the comment that is now part of the answer, I firstly asked why the intercept removal. In addition, I recommended to check via Anova from the car package if the interaction variable is significant (for that, you should use a ANOVA type III I would say). In addition, what I think you want down the line is to understand how the snail abundance varies with the types of food, you should use emmeans from the emmeans package. If you want also to check how the food availability influences snail abundance per type of food, emtrends from the same package might be helpful – André Barros Jan 15 '23 at 20:37

1 Answers1

0

I would say that you may have two solutions:

  1. Use it as a variable in the model - In that sense, you could do a more complex model with interaction (~ Food + Food Availability + Food:Food Availabity), which can give you an answer which is the most important variable for the number of snails (and even if the Food Availibity role varies according with the type of food).

  2. Input it as an offset in the model - From a statistical point of view, this would add a variable to the model with a known coefficient and, thus, not required to be estimated by the model. In practice, this will account for variables that may play a role in the variation of your dependent variable (like Food Percentage Availability). However, this is a solution that rarely is suggested but, do take a look here:

  • 1
    I added to the question the information that the author tried to add via new "answers." You had an important comment on one of those, which you might want to repeat on the edited question or to address by editing your answer. I agree that Anova() and emmeans are good choices here, as that comment said. – EdM Jan 15 '23 at 19:49