2

I have a set of drivers' routes each represented as a sequence (time, longitude, latitude). The data is received from GPS in driver's smartphone. The sequences have different length and coordinates are significantly spread (there are some outliers when, for example, some driver was on vacation and had his phone turned on during the flight).

There is also a route planning system which generates for given start and end points an "optimal" route between them. I would like to compare these offered routes with actual routes people use.

I have a set of popular start/end points and for each such pair I want to extract actual routes between these points from data described above, clean data and quantitatively estimate and average the difference between true routes and offered route. The problem is that I need some dissimilarity measure.

I had two ideas.

The first one - "discrete" approach - to consider routes as strings and compute edit distance between them. This approach has at least two disadvantages: it doesn't correspond to intuitive notion of routes' dissimilarity and it's very costly to compute (all standard algorithms have quadratic complexity in sequence length and sequences in my data can be very long).

The second one - "continuous" approach - to consider routes as curves and define dissimilarity between them as area of map bounded by these two routes. I'm not sure how to efficiently implement this idea and so I ask this question.

Are there any standard approaches to my problem, i.e. some widely accepted dissimilarity measures for such data? If not then which heuristics may I use to choose between the two approaches described above or to generate new ones?

Any ideas will be highly appreciated.

Igor
  • 171
  • Are you trying to make a recommendation based on the starting lat/lot point alone or do you also take into account metrics related to the driver ID (i.e. the history of that specific driver and not just the patterns of similar routes)? – Digio Aug 02 '17 at 12:49
  • @Digio Route planning system uses only coordinates of start/end points. – Igor Aug 02 '17 at 12:55
  • Now I see what you mean. Well, in that case I'm not familiar with a standard approach but your "continuous" method sounds more credible. It would perhaps be easier to implement your curve idea simply by measuring and comparing Haversine distances. – Digio Aug 02 '17 at 13:06
  • @Digio I know what Haversine distance is but how exactly do you offer to use it? My time stamps are not regular so I can't just iterate through both sequences and compute something for appropriate route points. Probably I just misunderstood your idea. – Igor Aug 02 '17 at 13:28

0 Answers0