In R, I have a dataframe with roughly 3 million observations, with the columns being longitude, latitude and time respectively. My goal is to form clusters (using a custom distance function), and then form a single dataframe containing the observation from each cluster with the earliest time value. Two observations should be in the same cluster if the absolute difference in their time values is less than 100, and their spatial distance is less than 4km.
Firstly, here is some sample data:
library("geosphere") # contains the distm function, which we'll use to calculate distance given long-lat.
# Here is a reproducible dataset.
Long1<-c(149.1250,149.0774,149.1250,148.5352,149.0994, 149.0301, 149.0883, 149.0663, 149.1097)
Lat1<-c(-21.9876, -21.9171, -21.9876, -22.4645, -21.9206, -22.0115, -21.9188, -21.9249, -21.9546)
Time<-c(1000,500,754,129,6050,1908,6109,245,6049)
# Create our dataset
DatPoints<-cbind(Long1,Lat1,Time)
# Create a spatial distance matrix
DataMat<-as.matrix(cbind(Long1,Lat1))
Dist_Mat<-distm(DataMat,DataMat,fun=distHaversine)/1000 # creates a spatial distance matrix in km
From the distance matrix above, we can see that observation 5 and 7 have a spatial distance (1.1637km) that is less than 4km, and their time values differ by less than 100. So, observations 5 and 7 should be in the same cluster. Further, observations 5 and 9 have a spatial distance (3.931458) less than 4km, and their time values differ by less than 100, and so observations 5 and 9 should also be in the same cluster. So observations 5, 7, and 9 should all be in the same cluster, despite observations 7 and 9 being further than 4km apart.
I am wary of using any raster techniques because some of the observations (in my original file as well as the example data above) have the same lat-long coordinates. I have seen many similar examples on stack exchange, (such as Clustering spatial data in R?) and they all seem to suggest the use of the hclust function, using a "single" method. Indeed, if I was only interested in clustering using the spatial distance (with 4km as the cut off distance), then I think I can just do something like
chc<-hclust(as.dist(Dist_Mat,diag=TRUE),method="single")
chc.d40<-cutree(chc,h=4)
But I am unable to generalize this approach to the case where we need to consider a third attribute such as time (as opposed to just lat-long). How can I achieve the goals that I specified at the beginning of this question?

