-1

I sampled a population of an insect in an area and got GPS points. Now I want to investigate if there are subpopulations within the pop using distance from points. I have a dataset with coordinates data X and Y in this format (45.13904444, 6.990686111). The spatial reference system I use is WGS 84 UTM zone 32N EPSG:32632. I would like to do a cluster analysis according to the distance between points.

Anyone can give me a script to do this?

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
  • 1
  • 2
    What have you tried? What kind of "analysis" do you want? What's the fundamental question you are trying to answer? You can't just say "please give me a script to do cluster analysis" and expect to get what you need. – Spacedman Aug 16 '19 at 09:57
  • 1
    I resolved with @Joseph link on "Clustering spatial data in R?". Previously I tried a kNN approach but it doesn't work. Just for comprehension, I sampled a population of an insect in an area and got GPS points. Now I want to investigate if there are subpopulations within the pop using distance from points. – Viviana Di Pietro Aug 16 '19 at 10:16
  • @VivianaDiPietro - I think it would be good if you edit your question to include the details of what you were investigating and then post an answer showing what you did and the script you used. This could help others in a similar situation :) – Joseph Aug 16 '19 at 10:52
  • What do you mean when you say the kNN approach doesn't work? What did you try? What errors did you get? Is a kNN clustering better at answering the question you have about your data? What is that question? – Spacedman Aug 16 '19 at 13:25

1 Answers1

3

I resolve with this script:

library(sp)
library(rgdal)
library(geosphere)

# example data from the thread
x <- c(-1.482156, -1.482318, -1.482129, -1.482880, -1.485735, -1.485770, -1.485913, -1.484275, -1.485866)
y <- c(54.90083, 54.90078, 54.90077, 54.90011, 54.89936, 54.89935, 54.89935, 54.89879, 54.89902)

# convert data to a SpatialPointsDataFrame object
xy <- SpatialPointsDataFrame(
      matrix(c(x,y), ncol=2), data.frame(ID=seq(1:length(x))),
      proj4string=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84"))

# use the distm function to generate a geodesic distance matrix in meters
mdist <- distm(xy)

# cluster all points using a hierarchical clustering approach
hc <- hclust(as.dist(mdist), method="complete")

# define the distance threshold, in this case 40 m
d=40

# define clusters based on a tree "height" cutoff "d" and add them to the SpDataFrame
xy$clust <- cutree(hc, h=d)
PolyGeo
  • 65,136
  • 29
  • 109
  • 338