I'm working on a project where two of the features are entryHeading and exitHeading. Both state the direction (N, NE, E, SE, S, SW, W) of a vehicle at multiple points. My question is how would i go about pre-processing this? My first thought would be to circularize it like I would a 24 hour period but I'm not sure I should go about it in the same way. The data will eventually be used to train a Random forest regression
Asked
Active
Viewed 82 times
1
-
You can make it into a unit vector, which may more aptly describe the feature (rather than a time), especially if you already have a location. If all you want is an orientation, I imagine you can just define it as a degree (e.g. 1-360 degrees about North). – Jason K Lai Sep 24 '19 at 16:20
-
1Does it matter for your problem that (e.g.) NE and SE are "closer" to N than other directions? – Peter Sep 24 '19 at 16:21
-
What is your final y variable? In most cases our usecase determines the preprocessing. – mahesh ghanta Sep 25 '19 at 04:10
1 Answers
0
One option is to map the ordinal categories to numerical degrees. This would allow the machine learning model to learn the relative relationships. The machine learning model might interpolate values which may or may not be useful.
One common mapping is:
n -> 0
ne -> 45
e -> 90
se -> 135
s -> 180
sw -> 215
w -> 270
The book "Statistical Analysis of Circular Data" by Fisher goes into greater detail about more options.
Brian Spiering
- 21,136
- 2
- 26
- 109