I have a collection of GPS log of many vehicles. Each row has timestamp, location, speed and vehicle number. I'd like to infer to distribution of speed, or at the minimum, the portion of time they are stationary v.s. when they are moving. The vendor do not make any claim on the independence of the data. In fact, our theory is that it logs more often when the vehicle is moving than when it is stationary. So the collection of data maybe correlated to speed that we are trying to measure.
Is there any valid approach to make inference from this data? Does bootstrapping offer any help for this problem?
Edit:
Here is one view of the data. I plot the speed against the location. Basically I treat them as Bernoulli distribution of red v.s. blue. Just counting the data point it comes out to be 28% blue, which is reasonable. But I want to make a more statistical valid estimate from this.
It is a small surprise to see the apparent vertical band. I added jitter on the y-axis but the x-axis is suppose to be continuous.
Thanks!!