2

My question is regarding this post from 1.5 years ago: Modelling clustered data using boosted regression trees

My label is a binary variable (yes/no). Is it possible to use GPBoost / MERF in order to make predictions for this binary variable? I wasn't able to find an answer to this question really quick. If no, which multilevel/mixed effects models can do this? I've already found Mixed Effects Logistic Regression.

Thanks, Kind regards, Olivier

olivier
  • 33

1 Answers1

1

Yes, we can use binary response variables. In GPBoost we set likelihood="binary". MERF in theory can also handle it, as based on the MERF's author's PhD thesis here binary outcomes are within the scope of the methodology but I see no such option in the MERF package itself.

I would suggest looking at bambi (in Python/PyMC3) or rstanarm (in R/Stan) too, you can define splines for your main fixed effects so they have non-linear influence in the response but also have a full MCMC approach in fitting the model and check the relevant diagnostics. (statsmodels often provides sub-optimal fitting results when dealing with mixed models.)

usεr11852
  • 44,125
  • thanks! really clear, i will look into this ! – olivier Apr 07 '22 at 07:48
  • Cool, I am glad I could help! – usεr11852 Apr 07 '22 at 09:11
  • I also have another question: The goal is to make a generalizable model, so that everyone can use the model. To make things more clear: Optimally, my test set does not have an ID column (the random effect). Is it still possible to make predictions or does the test set NEED an ID column? If yes, can I assign a random number to the test set ID in that case? In this github link: https://github.com/fabsig/GPBoost/blob/master/python-package/gpboost/sklearn.py the default for group_data_pred is None – olivier Apr 14 '22 at 13:27
  • I think this is rather involved to answer in comments, can you please make a new question? – usεr11852 Apr 14 '22 at 14:47
  • Ok yes, I made a new question! – olivier Apr 15 '22 at 07:57