2

Goal:

I have car accident data with geo-positions and would like to create a model to predict hotspots due to specific influence factors or features.

Problem:

To validate the results I want to create a test-set but since I only have accidents and no samples for accident-free car rides I thought about creating them artificially. Unfortunately, I have no data of traffic density for specific roads.

Current Approach:

Therefore I thought about two ways of approaching this:

  1. Use the geo-position of occured accidents and pick other features randomly to keep the distribution intact.
  2. Create Random samples on random geo-positions (within road-network) with random features.

Question:

Is there a way to create artifical samples for this in a way which introduces less bias?

Andreas
  • 145
  • 6

0 Answers0