Suppose I have a numerical discrete variable that does not apply to all my observations, i.e. 'years_married'. Not all the people in my dataframe is married, so they have an 'NA' registered in this variable.
What would be a correct way to proceed in this case? 'years married' is an important variable for my study (if they are married), so I don't want to discard it.
One idea is to split the dataframe in two, one with this variable (for those who are married), and other without it (for singles), and model them separately, but this would drastically reduce the number of observations (at least in one of the subsets) and the predicion accuracy.
Is there any technique or transformation, or recommend me an algorythm (i.e. Random Forest), that can handle this situation?
Thanks :)
Edit: May be it was not a good example. The exact case is about AGE of a device at the moment of the study. I have the date when data form device was collected, but not in all cases I have the construction date of the device
micepackage is a frequent choice for implementation. Follow themissing-datatag on this site for suggestions. – EdM May 03 '23 at 13:52