I have a data frame looks like:
x y group
1 2 1
1 3 1
1 4 2
1 5 2
1 6 3
...
For each group, I would like to find the distance to its 'nearest' group. Here, nearest is defined as the group which has the shortest distance to that group; and distance is defined as the shortest distance between all members from those two groups. For example, the distances between all members within group 1 to all members within group 2 is:
(1,2) -> (1,4) = 2
(1,2) -> (1,5) = 3
(1,3) -> (1,4) = 1
(1,3) -> (1,5) = 2
1 is the shortest, therefore the distance between group 1 and 2 is 1. Same idea, the distances between all members within group 1 to all members within group is:
(1,2) -> (1,6) = 4
(1,3) -> (1,6) = 3
therefore the distance between group 1 and 3 is 3. Since 3 > 1, therefore the nearest neighbor to group 1 is group 2, and the distance is 1. I would like to apply this metric to a really large dataset and I am able to achieve this idea using nested-for loops, but apparently it is very slow. Is there any faster solution? Appreciated!