0

I am trying to compare two methods but I am looking for an appropriate metric. I have two methods which both infer stop-locations for a person - each using a different type of data. A stop-location is represented as a time interval. For example, consider the following scenario:

Method A:
09:00-09:30
09:40-10:00

Method B:
09:00-09:14
09:15-09:29
09:38-10:00

Ground truth:
09:01-09:30
09:38-10:00

In the above example, the ground truth is that the person visited two stop-locations for 29 minutes and 22 minutes. The first method is a little off around the start/stop-times. The second method is better at getting start/stop-times correct, but splits the first stop-location into two pieces.

I am interesting in comparing Method A to Method B, determining which is better at inferring the ground truth.

One possibility would be to simply compute the fraction of minutes where each method predicts a stop-location where there is none, and the fraction of minutes where each method predicts no stop-location where there is one. One could for example use the F1-score for the metric then.

Is there a better way to approach this?

amoeba
  • 104,745
utdiscant
  • 1,570
  • It all depends on what you mean by "better": there is no golden, universal statistical procedure to tell you which errors matter more. You need to tell us. What is the cost or harm in erring on an endpoint of an interval? In splitting an interval? In other errors? This information needs to be quantitative if you wish to apply rigorous methods to incorporate it in the solution. – whuber Dec 23 '14 at 22:57
  • Thanks for the comment - you are of course absolutely right. My problem comes from the fact that by being able to choose my own metric - I also have the possibility to choose it in such a way as to help my own results look better. I was hoping somebody could give a reference to a common way of measuring such an error, so that I will not be making my results look better than they actually are. – utdiscant Dec 25 '14 at 23:55

0 Answers0