2

I am trying to extract multiband (3 bands) raster values to a point shapefile. In the following code, centroids_gdf is the variable that stores the point features. src stores a list coming from tifs with a glob function.

Basically, I create centroids from a grid coincident to raster grid and then, assign the raster values to the point geodataframe. The code works but the problem is that it is repeating the point 3 times. I suspect that it is repeating the points per each band. How can I avoid this?

I saw this post, actually, I use the code from it, but it works for one single band. Python - Extract raster values at point locations

I have also seen other questions but I cannot find the proper answer.

import rasterio
import glob
import os
import geopandas as gpd

out_clip = "...\folder to store"

shapefile_areas = r"shapefile_areas.shp"

clips_folder = "...\clip"

the_tifs = glob.iglob(os.path.join(clips_folder,"*_clip.tif"))

tif_list = list(the_tifs) # I create this to iterate through the tifs later

#Read the shapefile into a GeoPandas dataframe. All areas stdAreas = gpd.read_file(shapefile_areas)

names_StAreas = stdAreas["name"].unique()

the_grids_to_centroids_extraction = glob.iglob(os.path.join(out_clip,"grid_*.shp"))

name= 0

for glo in the_grids_to_centroids_extraction:

display(glo)
gdf_glo = gpd.read_file(glo)

#extract the centroids! 
centroids_ = gdf_glo.centroid


#extract coordinates:

centroids_.to_file(filename=os.path.join(out_clip+"\\centroid_values\\"+ names_StAreas[name]+"_centroids.shp"), driver='ESRI Shapefile')

centroids_gdf = gpd.GeoDataFrame(gpd.GeoSeries(centroids_))

#create geometry column
centroids_gdf['geometry'] = None
centroids_gdf['geometry'] = centroids_gdf[0]



#Assign raster values
centroids_gdf.index = range(len(centroids_gdf))
coords = [(x,y) for x, y in zip(centroids_gdf['geometry'].x, centroids_gdf['geometry'].y)]

# Open raster values (multiband)
src = rasterio.open(tif_list[name])

#Create each column for each band
centroids_gdf['b'] = [x[0] for x in src.sample(coords)]
centroids_gdf['g'] = [x[1] for x in src.sample(coords)]
centroids_gdf['r'] = [x[2] for x in src.sample(coords)]

del centroids_gdf[0]


centroids_gdf.to_file(filename=os.path.join(out_clip+"test.shp"),diver = 'ESRI Shapefile')

1 Answers1

2

I found a possible solution. Maybe it is not efficient, as I calculated the "mean" of 1 value, but it is working with several rasters and polygons. I did not have to extract the values to points, it is enough to polygons.

These are the resources that I used, although I did not find the solution but they helped to research about this:

https://stackoverflow.com/questions/50332455/concatenate-columns-of-dataframes-in-a-loop-pandas

https://stackoverflow.com/questions/28669482/appending-pandas-dataframes-generated-in-a-for-loop

Most useful:

https://kodu.ut.ee/~kmoch/geopython2020/L5/raster.html#calculating-zonal-statistics

Basically, I am using the zonal_stats() function from rasterstats package.

from rasterstats import zonal_stats

#get the names of study areas #all areas merged shapefile_areas = "..\Study_areas.shp"

#Read the shapefile into a GeoPandas dataframe. All areas stdAreas = gpd.read_file(shapefile_areas)

clips_folder = "..\clip"

the_tifs = glob.iglob(os.path.join(clips_folder,"*_clip.tif"))

out_clip = "...\extent"

the_grids_extraction = glob.iglob(os.path.join(out_clip,"grid_*.shp"))

list_grids = list(the_grids_extraction)

OUT_TEST = "...\folder"

names_StAreas = stdAreas["name"].unique()

count = 0

for i in the_tifs:

dataset = rasterio.open(i)

n_bands = dataset.count 

print(n_bands)

grids_gdf = gpd.read_file(list_grids[count])

list_names_fields =['r','g','nir']

for zstat in range(n_bands):

    # zonal statistics
    zs = zonal_stats(grids_gdf, i, stats=['mean'],band=zstat+1)

    display("-----")

    name_field = list_names_fields[zstat]

    stats_df = pd.DataFrame(zs)
    stats_df.rename(columns={'mean':list_names_fields[zstat]}, inplace=True)

    grids_gdf = pd.concat([grids_gdf, stats_df], axis=1)



    grids_gdf.to_file(os.path.join(OUT_TEST,names_StAreas[count]+'_rgbValues.shp'),driver='ESRI Shapefile')



count = count +1