I have a dataset relating to humans completing reviews, the target variable is whether the review decision is correct / incorrect and one of my features is a trailing 4 week accuracy score for the reviewer.
These accuracy scores are not always available however. My question is around how to model this data - the fact that there is no available accuracy score might be a signal. From my research into this - everything I see tells me that the missing values must be imputed or removed. I am wondering whether there are techniques to incorporate the fact that the data is missing into the dataset.
Perhaps I could convert the score into a categorical variable {low, medium, high, not available] - would this be common practice? I am open to suggestions and would love to hear what is commonly done in these scenarios
Ideally, I would not have to change the score feature to a categorical variable, since there might be value in maintaining the continuous numerical format (0-1). Is there any method you know of to do this - perhaps a modelling technique that allows for null values, I have read that some decision trees can allow for this
– Stats DUB01 Jan 20 '21 at 20:37