1

In a much larger dataset, I am getting an error saying the variable is not available for a mgcv model fit. I've been able to reproduce with a toy example, below. For context, the variable causing the issue is a data indicator flag (for genuinely undefined / non-existant data measurements, as per Handling NAs in a regression ?? Data Flags?).

I am getting an error saying a variable is not available 'Error in eval(parse(text = terms[i]), enclos = p.env, envir = mgcvns): object 'obs_flag' not found'

As you can see, the error is only occurring when there is an interaction between the flag and spline variable.

library(mgcv)
test = data.frame('num_var' = rep(c(0,5,7,11,30,100,0), 3), 
'obs_flag' = rep(factor(c(FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)), 3), 
'target' = rep(factor(c('Yes', 'No', 'Yes', 'No', 'No', 'No', 'Yes')), 3))

my_formula_working <- target ~ obs_flag:num_var my_formula_not_working <- target ~ obs_flag:s(num_var)

fit1 = gam( my_formula_working1, family=binomial(link = logit), data = test , na.action = na.fail, method = 'REML') fit2 = gam( my_formula_not_working, family=binomial(link = logit), data = test , na.action = na.fail, method = 'REML')

Meep
  • 284
  • Hi @Meep, is there a statistical question here? If not, it might be better to ask this on Stack Exchange. – Alex J Dec 07 '23 at 23:23
  • Try my_formula_not_working <- target ~ s(num_var, by = obs_flag). It still errors but it's a different error. – Alex J Dec 08 '23 at 00:46

1 Answers1

1

You have a few different issues here:

  1. You can't have a meaningful interaction term because num_var is always 0 when obs_flag is FALSE. Look at the output of fit1.

  2. The correct syntax in the second model would be s(num_var, by = obs_flag).

  3. If you correct the formula in the second model you'll get the error A term has fewer unique covariate combinations than specified maximum degrees of freedom. This is because num_var doesn't have enough distinct values for the default number of splines. I assume this is not an issue in your full data set.

Doctor Milt
  • 3,056
  • 1
    Thanks for your answer. #1 relates to the data indicator flag question I cited - where it is impossible to provide a real valuable for an explanatory var (e.g. 'time since last treatment' when there is no previous treatment). . #2 fixed my issue, thanks! – Meep Dec 10 '23 at 23:41