I am aware that Random Forests aren’t typically affected by collinearity issues, but I am trying to reduce how many variables I am utilising in my RF model.
There are variables that are obviously correlated such as GPS coordinates (lat,long) and area variables such as city, state, region. I was thinking surely any useful information will be able to be expressed solely through lat/long making the other variables redundant.
I also have an id variable as well as a time-stamp of when the record was added, which is also clearly correlated. Since this is a regression task, I thought the time stamp may be useful, but in that case should I remove the id variable?
I am not too sure how to proceed, as I need to drastically reduce how many variables I am using in my model, as my reqs need as few variables as possible, and also so my code doesn’t take ages to run.
Any advice would be invaluable.