I have a set of drivers' routes each represented as a sequence (time, longitude, latitude). The data is received from GPS in driver's smartphone. The sequences have different length and coordinates are significantly spread (there are some outliers when, for example, some driver was on vacation and had his phone turned on during the flight).
There is also a route planning system which generates for given start and end points an "optimal" route between them. I would like to compare these offered routes with actual routes people use.
I have a set of popular start/end points and for each such pair I want to extract actual routes between these points from data described above, clean data and quantitatively estimate and average the difference between true routes and offered route. The problem is that I need some dissimilarity measure.
I had two ideas.
The first one - "discrete" approach - to consider routes as strings and compute edit distance between them. This approach has at least two disadvantages: it doesn't correspond to intuitive notion of routes' dissimilarity and it's very costly to compute (all standard algorithms have quadratic complexity in sequence length and sequences in my data can be very long).
The second one - "continuous" approach - to consider routes as curves and define dissimilarity between them as area of map bounded by these two routes. I'm not sure how to efficiently implement this idea and so I ask this question.
Are there any standard approaches to my problem, i.e. some widely accepted dissimilarity measures for such data? If not then which heuristics may I use to choose between the two approaches described above or to generate new ones?
Any ideas will be highly appreciated.