I am working with a private medical dataset including categorical features coming from patients examinations. However, the problem is that some patients underwent MRI, others scanner, and some underwent both. Thus, scanner-only patients have missing values in the MRI associated features, and vice-versa.
How could I handle this situation? I thought about 3 solutions for now:
Using an "examination not passed" category to replace missing values, but this would be considered as a full category on itself by machine learning algorithms. They could make correlations such as "exam not passed" => "class number 1" but there is no link between both as the examination rely on availability of imaging devices in the hospitals from where the data were collected. Some just didn't own MRI devices, etc.
Treat MRI, scanner, and MRI+scanner patients as 3 different datasets and train a different model on each one. But doing so would imply writing specific code wrapping Sklearn objects in order to automatize the whole training process.
Using a model robust to missing values such as XGBoost. I don't think it is a good idea, my problem should be handled beforehand as XGBoost uses its own imputing values. It is just moving the problem elsewhere.