0

I have a CSV file (about 1,000,000 georeferenced points) which contains point_id/latitude/longitude. I also have a shapefile which represents a grid cell_id/xmin/xmax/ymin/ymax. I am trying to join the grid attributes (specifically the cell_id) to the points from my CSV file. Basically I want to know in which cell the points fall.

So my output would look like that point_id/latitude/longitude/cell_id.

Here is the code I have been working on based on this post and that post:

library(sp)
points<-read.csv(file = "/Users/input.csv", stringsAsFactors=FALSE)
coordinates(points)<- ~longitude+latitude  
library(sf)
grid <- read_sf("/Users/grid.shp")
a.data <- raster::intersect(points,grid["cell_id",,,])
points$cell_id <- a.data$cell_id
write.csv(points, file = "/Users/points.csv")

However this does not gives me the cell_id but instead returns the point_id twice.

I have also tried using over but it yielded an error message.

Vince
  • 20,017
  • 15
  • 45
  • 64
Marcel Campion
  • 503
  • 3
  • 13

1 Answers1

1

You are mixing and matching different object classes (sp and sf). The raster::intersect function may be able to deal with this but, it is a dangerous assumption and is not good practice. I would use the raster::shapefile function to read grid.shp (resulting in an sp object) and not attempt an index to limit to cell_id. Just let the function work and then deal with the resulting @data data.frame slot. You are trying to be too cleaver and it is causing problems. This really is a fairly simple point in polygon analysis.

First, create some example data (meuse point data) and r (polygon grid).

library(sp)
library(raster)

data(meuse)
  coordinates(meuse) <- ~x+y

r <- raster(extent(meuse), resolution = 100)
  r[] <- 1:ncell(r)
r <- as(r, "SpatialPolygonsDataFrame")
  names(r@data) <- "cell_id"

plot(r)
  plot(meuse,col="red",pch=19,add=TRUE)

Now, we can use the point.in.poly function in the spatialEco package. This will intersect the point and polygon feature geometries and return the associated polygon attributes. By default the returned object will be a SpatialPointsDataFrame however, you can use sp = FALSE to result in an sf class object.

meuse_grid <- spatialEco::point.in.poly(meuse, r)
  names(meuse_grid)
  meuse_grid$cell_id  
Jeffrey Evans
  • 31,750
  • 2
  • 47
  • 94
  • Surely if the grid is regular you can work out the raster I,J for a point by arithmetic - subtract xmin, divide by cell width or similar... – Spacedman Jan 29 '19 at 21:02
  • 1
    @Spacedman, one would think but, I would not take it for granted that the cell_ids are uniform (they are in my example for simplicity sake). Also, one could have a grid with irregular "cell" sizes across the region. This would make it a non-systematic grid but, I have seen this type of data referred to as a grid. – Jeffrey Evans Jan 29 '19 at 21:08
  • 1
    The data function was just missing the closing. I do not see an error regarding plot margins, is this on your data on the example? – Jeffrey Evans Jan 29 '19 at 21:16
  • @JeffreyEvans I figures out the mistake I was doing on the code that I talked about in my previous comment. So I am able to replicate the first part of your code. However when I try to use the second part witch my data I get this message after running "pts_grid <- spatialEco::point.in.poly(pts, grid)" Error in st_geos_binop("intersects", x, y, sparse = sparse, prepared = prepared) : st_crs(x) == st_crs(y) is not TRUE – Marcel Campion Jan 29 '19 at 21:54
  • 2
    @MarcelCampion this error indicates that your data is missing a projection string or are in different projections. Check both data using: proj4string(x) – Jeffrey Evans Jan 29 '19 at 21:56
  • @JeffreyEvans Yes my point data was missing a projection string. Thank's a lot for your help on that. For those of you that want to assign a projection to points from a csv similar to the projection of a shapefile you can simply do that proj4string(meuse) <- proj4string(r) using Jeffrey's semantic. – Marcel Campion Jan 29 '19 at 22:25
  • Many spatial functions test for projection string equivalency and not actual projection space. As such, you could also set all of your data to NA using: proj4string(x) <- NA However, this assumes that your data overlaps in geographic space and are not in two different project spaces. – Jeffrey Evans Jan 29 '19 at 22:47