How to write a generalized additive model, fitted in R programming language, in a mathematical form?

Question

I have fitted a model in R programming language. My dependent variable is disease severity and my predictors are weather variables. How can I write this model in mathematical form for a manuscript? Is there a package which can help me write this in mathematical form? I will be making my GitHub repository available, but the idea is to write just model formula in a manuscript. Thank you!

mod1 <- gam(severity ~  s(mean_rh, k = 8) + s(mean_temp, k = 10) + s(mean_ws, k =7) + s(rain, k = 7), family = betar(),  data = data)
summary(mod1)

$\mathbb{E}[g(y_i)] = \sum_{p=1}^P h_p(x_{i,p})$ where $g$ is your link function, $h_p$ is the learned univariate transformation of the $p$th input ($p$ might be mean rh or mean temperature), and $x_{i,p}$ is the actual value of, say, mean temperature for the $i$th observation, and $y_i$ is the severity of the $i$th observation. You would note separately in the narrative the values of $k$ you chose (I don't use R much, these are like spline degrees of freedom or something like that right?). — John Madden, Jan 28 '23 at 15:51
hmm I'm not sure that this question is an exact duplicate since the linked answer seems to concern a univariate additive model. — John Madden, Jan 28 '23 at 16:15
As the indicated duplicate question and answer show, there is no simple, useful mathematical formula for the default thin-plate spline smoothers implemented via an s() term in a gam() model. @JohnMadden this question is about a simple sum of additive s() terms, so the difficulty in the linked answer holds here. If regression splines has been specified via bs="cr" within the s() terms, then a formula would be possible (albeit messy). — EdM, Jan 28 '23 at 16:16
@EdM Perhaps not notation that tells us exactly what's going on, but I would argue it's useful to abstract away exactly what the splines are doing and write the formula in terms of abstract univariate transformation as my initial comment suggests. Though I'm not a GAM expert and look forward to hearing others' thoughts on the matter. — John Madden, Jan 28 '23 at 16:20
@JohnMadden the entry on smooth.terms in the mgcv manual explains that with the default thin-plate spline "a truncated eigen-decomposition is used to achieve the rank reduction," which leads to basis functions that are hard to interpret and would unnecessarily complicate the presentation in a manuscript. For presentation in a manuscript, as this question proposes, a simple statement of the model formula via the s() terms would be most intelligible. In this additive model, plots of outcome versus predictors would illustrate. — EdM, Jan 28 '23 at 16:30
@EdM For presentation in a manuscript, as this question proposes, a simple statement of the model formula via the s() terms would be most intelligible. Are you proposing to write the exact R code that I posted above in the manuscript? I haven't come across this form of presentation yet. Thanks — Ahsk, Jan 28 '23 at 16:33
Yes, you can present the model that way provided that you describe what the s() terms mean, show a summary of the model output, and plot outcomes as functions of predictors. If you had used cubic regression splines with defined knots (thin-plate spline don't have knots in the usual sense) then you could get a formula, but it would have a very large number of terms with 4 spline-fitted predictors. Also, note that this additive model does not include interaction terms, which I think you have found to be important with this data set. — EdM, Jan 28 '23 at 16:47
@EdM Yes, I did find interaction. But due to limited data points (37), I fitted this model to investigate main effect, and the conservative model 1 (first model) discussed here https://stats.stackexchange.com/questions/603155/seeking-feedback-on-the-interpretation-and-reporting-of-a-glmm-output/603171?noredirect=1#comment1118874_603171 to investigate to discuss the interaction effect. I can't afford to have both main and interaction term in the modeld due to limited data points. DHARMa residuals for this model show no deviations. — Ahsk, Jan 28 '23 at 16:56
@Ahsk the GAM implicitly includes penalization to deal with overfitting based on small numbers of observations. You can include interactions in a GAM if you wish. — EdM, Jan 28 '23 at 16:58
@Ahsk please post the model and model output into a new question. This site works best if each page is about a separate question. How to interpret a GAM model is a different question than the current one, which is about how/whether to try to get a closed-form equation from this type of GAM. — EdM, Jan 28 '23 at 18:01

How to write a generalized additive model, fitted in R programming language, in a mathematical form?

0 Answers0