I actually work on medical datas to predict outcome after treatment on patients with metastasis lesions of carcinomas. Each patient have different number of lesions with some with like 30 lesions and others, only 3, and, among these lesions, some will be classified as "0" (stability or response), and others as "1" (progression). One patient can have one "1" and multiple "0" and vice versa...
To better understand there are two levels : 1)Lesion Level Analysis : The first step is to analyse at the lesion level, trying to predict whether a particular lesion will be 0 or 1 based on the characteristics of the lesion. I have very good results with a xgboost model.
2)Patient Level Analysis: Once you have the model from the lesion level, you want to aggregate these predictions to the patient level in order to predict OS/PFS....
The great question is : How do you transite from the "lesion scale" to the "patient scale" ?
I wondered about the mean of each feature among the same patient, or to sum up the values of each feature...but seems not logic to me. Do you have articles ? or explanations about some technics you use ?
In terms of medical explanation, I can't just use the majority vote among the patient's lesions (for example, I can't say that the patient is 0 because he has 30 "0" and 1 "1", because the 1 is very important...he must be 1 then).
EDIT : I found this topic explaining my actual strategy : Difference between generalized linear models & generalized linear mixed models
I think I'll use a GLMM with the Patient as a random effect to predict my outcome. I'll come back as soon as possible to edit my progression