Why do gam predictions not match gam smooth?

Question

I am studying the effect of organic farming on honey reserves in honeybee colonies. I am trying to see how an increase in the percentage of organically farmed land (at various buffers around bee hives) impacts bee colonies. I would like to create 3-dimensional plots with the percentage difference of honey reserves between colonies that are placed in landscapes with more and less than a certain threshold values of organically farmed land on the z axis, the threshold value on the x-axis and time or buffer radius on the y-axis (with one of the latter two being fixed). Something like this (or something a little more smoothed):

Therefore, I need to model the honey reserves of colonies in colonies that exceed the threshold value and those that don’t. I did this both separately with separate gam models for “> threshold” and “≤ threshold” and with a single gam model:

#1
gam_less <- gam(total.reserves ~ ti(week) , method = "REML",
                            data=bio.sub_less)
#2
gam_interact <- gam(total.reserves ~ te(week,by = binary) , method = "REML”, data=bio.sub)

I predicted the honey reserves using these models. For model #1 I used this code:
newdat = expand.grid(week,years,unique(bio.sub_less$numero))
            names(newdat) = c("week","year","numero")
            predicted_less = predict(gam_less,newdat)
            predicted_less = predicted_less[1:length(week)]

Analogously I predicted the honey reserves for the colonies exposed to more OF than the threshold. I plotted the observations with a gam smooth:

predict.obs_1000m_10perc.gg = ggplot(predict.obs_1000m_10perc, aes(x=week,y=total.reserves,colour=binary))
predict.obs_1000m_10perc.gg + geom_point(size=0.8) + geom_smooth(method="gam",formula = y ~ ti(x)) + 
geom_line(aes(x=week,y=reserves.more.no.random, colour= "gam no random more"))+
    geom_line(aes(x=week,y=reserves.less.no.random, colour= "gam no random less"))

I was surprised to see that the gam smooth did not match the gam predictions as you can see here:

(I tried various models, this is the one that got closest to the smooth)

Why is that? Did I make a mistake?

score 2 · Accepted Answer · answered May 04 '16 at 10:06

2

I don't fully understand the math behind what gam does with by, but the differences between the model predictions and the ggplot2 smooths is almost certainly because they weren't fit to the same data.

Internally, ggplot2 splits up the data into two subsets, based on binary in your case, then fits two separate gam models to them. This might account for why the ggplot2 smooths seem to be overfitting a bit more, since each smooth is fit based on less data than the single gam model.

answered May 04 '16 at 10:06

JoFrhwld

2,437

Hi Thanks for your answer. I tried several approaches. One approach was to use one single model that included a by model. Another one was to split the data and model the data for gam_less and gam_more as in #1 separately. The 2D plot was actually done based on two separate models (for more organic farming than the threshold and for less). – bee guy May 04 '16 at 11:23
typo: *that included the by argument (instead of included a by model) – bee guy May 04 '16 at 13:47
Hi @JoFrhwld was right that they were not based on the same data. I made predictions based on more weeks than I plotted. If I plot all weeks (with all observations) the smooth and the predictions match. (the error occurred because I forgot to type the all = T argument in the merge function. – bee guy May 18 '16 at 10:19

Why do gam predictions not match gam smooth?

1 Answers1

Linked