I'm back with another question regarding my earlier asked question: Making binary prediction with GPBoost (or MERF)
The goal of this project is to predict injuries, for which I only have a small subset of athletes available. This is a longitudinal study, since there are multiple individuals who have multiple observations. The model can be trained using a GPBoost/MERF algorithm. The training set contains x athletes and the test set contains y athletes. The athlete's ID will be the random effect. Optimally, my test set won't have an ID column (the random effect), because the goal is to make one model for the whole 'population'. Is it still possible to make predictions or does the test set NEED an ID column? If it's required, can I assign a random number to the test set ID in that case (like 999)? In this github link: github.com/fabsig/GPBoost/blob/master/python-package/gpboost/… the default for group_data_pred is None
If something is not clear, let me know, then I can edit the question.
Thanks in advance! Kind regards, Olivier