In a much larger dataset, I am getting an error saying the variable is not available for a mgcv model fit. I've been able to reproduce with a toy example, below. For context, the variable causing the issue is a data indicator flag (for genuinely undefined / non-existant data measurements, as per Handling NAs in a regression ?? Data Flags?).
I am getting an error saying a variable is not available 'Error in eval(parse(text = terms[i]), enclos = p.env, envir = mgcvns): object 'obs_flag' not found'
As you can see, the error is only occurring when there is an interaction between the flag and spline variable.
library(mgcv)
test = data.frame('num_var' = rep(c(0,5,7,11,30,100,0), 3),
'obs_flag' = rep(factor(c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)), 3),
'target' = rep(factor(c('Yes', 'No', 'Yes', 'No', 'No', 'No', 'Yes')), 3))
my_formula_working <- target ~ obs_flag:num_var
my_formula_not_working <- target ~ obs_flag:s(num_var)
fit1 = gam( my_formula_working1, family=binomial(link = logit), data = test , na.action = na.fail, method = 'REML')
fit2 = gam( my_formula_not_working, family=binomial(link = logit), data = test , na.action = na.fail, method = 'REML')
my_formula_not_working <- target ~ s(num_var, by = obs_flag). It still errors but it's a different error. – Alex J Dec 08 '23 at 00:46