I am new in the GAM modelling. I would like to find a family, that will fit my response variables. I am using the sums of monthly counts of beetles, collected from the beetle traps in ~ two weeks interval (this can vary between traps and years) and from diverse locations across Germany. My datasets contains some zeros, but not too many. Also, as I have sum of counts, my data are always positives and integer. As I am moving from counts (discrete), are now my sums continuous ones?
Shortly, my dataset contains some zeros (~10% of data), and also extreme values, where many traps have very low counts. Here is how my monthly sums looks like:
I have found that Tweedie distribution can account for existing zeros and is often used in GAM models in similar studies. But when applied in my data, I almost find a perfect fit, but weird pattern exactly at low/zero values:
I am not sure, how can I account for this data chunk in my family? I have tried different combination of a, b, and theta parameters, and tested several families in mgcv package (as I will further use bam). nb family has a good fit, but again have a weird pattern at low values. Maybe you have some suggestions how can I fit my data better? Thank you!
Here is my code, using gam(y~1) to only fit the y distribution, without any predictors:
m <- gam(count_sum ~ 1,dat, family = tw(link = 'log', theta = 1.85, a=1.82,b=1.99))
gratia::appraise(m)
I wonder, if this can be done only by adjusting the parameters within family, or should I move to completely different family? Thank you for your thoughts.
My study design is very similar to Irregular time series data including long-term trends, and spatially varying (e.g. share of the forests in each trap surrounding). Following @Gavin Simpson comment, I expect that the trap counts depends on location (XY) and time (variation betweeen months, between years). As suggested by @Gavin Simpson, I should move from using a single distribution to use different distribution for each trap? How can that be implemented?


gamformula framework (gam(y~s(x), family = pscl::zeroinf)), rather to move the whole model intopscl::zeroinf(z~s(x1) + ...). Do you think that this can be doable somehow? Both approaches seems to use different claims for model effects (random, cyclic, ..), factors, etc. and differents way to plot models. So as I am quite new in modelling, I am worried that moving fromgam()tozeroinf()will results in more errors... Thank you! – maycca Aug 05 '22 at 09:17gam. Yes, both tools use different assumptions, but my best guess would be that moving tozeroinflaltogether might be the best solution. Or follow Gavin's proposal. – Stephan Kolassa Aug 17 '22 at 07:59