Goal:
I have car accident data with geo-positions and would like to create a model to predict hotspots due to specific influence factors or features.
Problem:
To validate the results I want to create a test-set but since I only have accidents and no samples for accident-free car rides I thought about creating them artificially. Unfortunately, I have no data of traffic density for specific roads.
Current Approach:
Therefore I thought about two ways of approaching this:
- Use the geo-position of occured accidents and pick other features randomly to keep the distribution intact.
- Create Random samples on random geo-positions (within road-network) with random features.
Question:
Is there a way to create artifical samples for this in a way which introduces less bias?