My plan is this:
- Find source of data about when and where accidents occurred on the US 101
- Find a source of data about traffic volume on the same road
- Subset the data to include only accidents that occurred on the US101 between San Francisco and Palo Alto.
- Divide accidents by traffic volume for as small of time intervals as I can get traffic volume data. For example, if I can get traffic volume per hour, that would be great, because then I can divide average # of accidents in that hour on a given day by the volume of traffic in that window, and then assume, for lack of a better idea, that every car has an equal chance of being involved. Maybe I can get some data about different risk levels by driver age or type of car, but I imagine the insurance companies hold that data and aren't likely to share.
Suggestions for sources of data are greatly appreciated. Even if I have to do something lousy like multiplying national risk per capita by traffic volume, that would be good enough for now, my main problem is getting the data. Any clever suggestions appreciated. (FYI: this is just for personal interest)