I am running an analysis on the effect of canopy cover (OverheadCover) and the number of carcasses placed on the same location (CarcassNumber) on the proportion of carrion eaten by birds (ProportionBirdsScavenging). These data points were collected at different national parks, so I aim to include Area as a random factor. Long story short, I want to test for an interaction effect of OverheadCover and CarcassNumber. Test data and analysis below.
library(mgcv)
data_both <- data.frame(ProportionBirdsScavenging = c(0.406192519926425, 0.871428571428571, 0.452995391705069, 0.484821428571429, 0.795866569978245, 0.985714285714286, 0.208571428571429, 0.573982970671712, 0.694285714285714, 0.930204081632653, 0.0483709273182957, 0.0142857142857143, 0.661904761904762, 0.985714285714286, 0.0142857142857143, 0.0142857142857143),
pointWeight = c(233, 17, 341, 128, 394, 46, 5, 302, 10, 35, 57, 39, 12, 229, 28, 116),
OverheadCover = c(0.671, 0.04, 0.46, 0.65, 0.02, 0, 0.8975, 0.585, 0.6795, 0.0418, 0.5995, 0.6545, 0.02, 0, 0.92, 0.585),
CarcassNumber = as.factor(c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)),
Area = c("Hamert", "KempenBroek", "KempenBroek", "KempenBroek", "Markiezaat", "Markiezaat", "Meinweg", "Valkenhorst", "Hamert", "KempenBroek", "KempenBroek", "KempenBroek", "Markiezaat", "Markiezaat", "Meinweg", "Valkenhorst"))
gam_interaction <- mgcv::gam(ProportionBirdsScavenging ~ OverheadCover * CarcassNumber + s(Area, bs="re"), family=betar(link="logit"), data = data_both, weights = pointWeight)
summary(gam_interaction)
# Family: Beta regression(26.515)
# Link function: logit
#
# Formula:
# ProportionBirdsScavenging ~ OverheadCover * CarcassNumber + s(Area,
# bs = "re")
#
# Parametric coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 1.20570 0.15236 7.913 2.51e-15 ***
# OverheadCover -1.91892 0.12480 -15.376 < 2e-16 ***
# CarcassNumber2 1.76033 0.05319 33.093 < 2e-16 ***
# OverheadCover:CarcassNumber2 -8.30140 0.12432 -66.774 < 2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Approximate significance of smooth terms:
# edf Ref.df Chi.sq p-value
# s(Area) 3.792 4 452.9 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# R-sq.(adj) = 0.889 Deviance explained = 94.9%
# -REML = -2630.1 Scale est. = 1 n = 16
I've read that gam's (because they work additively) do not test very well for interaction effects. However, I also found that you can add some interaction like terms to your model with the by argument, but this only affects the model and doesn't test for interaction. By using the tensor product te() it should work, but CarcassNumber has insufficient unique values. Can somebody advise me on how I should properly test for an interaction effect while using gam? Is the way I did it above (with the * sign) scientifically correct?
ti()argument as above, r gives me an error (NA/NaN/Inf in foreign function call (arg 1)). I figured that it would probably be because categorical variables as predictors should be noted as fixed effect, right? So I trieds(OverheadCover) + CarcassNumber + ti(OverheadCover, CarcassNumber)which tells me thatCarcassNumber has insufficient unique values to support 5 knots: reduce k.. If I reduce the basis functions (k=c(10,2)), the error persists. – Peter Feb 02 '20 at 08:00te()without the Individual univariate effects. BTW, factor smooth basis typebs=“fs”also doesn’t work (Model has more coefficients than data), but this might be due to incomparable units of measure? Or does this mean I have insufficient data to test for interaction? For this test I only have 16 data points. Also, could you perhaps elaborate why is the Maximum Likelihood (ML) method is more appropriate here than the restricted maximum likelihood (REML)? Is this generally the case with thebetarfamily, or just if you test for interaction? – Peter Feb 02 '20 at 08:01s(OverheadCover) + s(CarcassNumber, k = 4) + ti(OverheadCover, CarcassNumber, k = c(5,4))but even that might not work if you con't have enough unique carcasses. In which case you can't expect to estimate a smooth ofCarcassNumber. The same withte(). Likewise with the"fs"basis as this will try to fitkx number of random effects coefficients, so you need to setklow enough for your data. – Gavin Simpson Feb 03 '20 at 17:00?summary.gamit metions theMLvsREMLbit as p-values were found to have better coverage usingML, thenREML, then other choices in the cited research. You also have to be very careful withREMLif you compare models with different fixed effects, which is something I suggested. – Gavin Simpson Feb 03 '20 at 17:03