I am building a model to predict time off and sick leave for a specific employee.
Each of the employees has one row per day from 01/01/2013 to 31/12/2018 in the dataset flagged with 0 or 1 (if that day was taken as a time off or sick day). I am using information like location, age, job position, etc.
I was thinking to use employee id as a feature to train the model. Some users have obvious patterns, e.g, taking Monday off for some consecutive weeks. On the other hand, I feel like I do not have enough information as to have one model per employee, specially if they have being hired recently.
After researching a bit, seems that the only encoding that would make sense for employee id would be 'One Hot Encoding'. However, this would generate up to 10,000 features which doesn't seem very optimal.
Do you have any ideas that can help me build this model in a better way?
Thanks in advance