3

I have a layer with classification od buildings. I want to add "Area" to my attributes table and I use PyQGIS. For another layer with 50 records it works, but my layer is huge, contains 6.8mln record and it is working too slow. With another problem I used GeoPandas which is working so fast. I will paste my code and I need advice. Can I do it with GeoPandas?

from qgis.core import *
from qgis.utils import *
from qgis.analysis import QgsNativeAlgorithms
from PyQt5.QtCore import QVariant
from qgis.core import QgsApplication, QgsProcessingFeedback, QgsRasterLayer
import sys
import geopandas

sys.path.append('/usr/lib/qgis') sys.path.append('/usr/share/qgis/python/plugins') os.environ["QT_QPA_PLATFORM"] = "offscreen"

QgsApplication.setPrefixPath(r'/usr/bin/qgis', True) qgs = QgsApplication([], False) qgs.initQgis()

import processing from processing.core.Processing import Processing

Processing.initialize() QgsApplication.processingRegistry().addProvider(QgsNativeAlgorithms()) feedback = QgsProcessingFeedback()

#adding the Area field layer = QgsVectorLayer(r'/home/gis/polskagisencoding.shp', "polskagisencoding", "ogr") provider = layer.dataProvider() area_field = QgsField("Area", QVariant.Int) provider.addAttributes([area_field]) layer.updateFields()

#updating the Area field for each feature idx = provider.fieldNameIndex('Area') for feature in layer.getFeatures(): attrs = {idx : int(feature.geometry().area())} layer.dataProvider().changeAttributeValues({feature.id() : attrs})

Vince
  • 20,017
  • 15
  • 45
  • 64
  • 2
    Have you tested to, instead of working on a shapefile, copy it into a memory layer. A memory layer is loaded entirely in memory and it's faster. If you create a GeoDataFrame, it's like a memory layer and you can yes, update features. – J. Monticolo Dec 04 '20 at 08:44
  • 2
    Examples in https://data.library.virginia.edu/how-to-use-the-field-calculator-in-python-for-qgis-3/ do it in different way. Also https://gis.stackexchange.com/questions/102119/how-to-change-the-value-of-an-attribute-using-qgsfeature-in-pyqgis is updating layer, not dataProvider. Perhaps your code does something sub-optimal with the shapefile 6.8 million times and the bottleneck is not pyqgis but how it is used. – user30184 Dec 04 '20 at 08:45
  • 1
    Why not put the data into a database like postgis and do the work there? Safer and almost certainly faster – Ian Turton Dec 04 '20 at 09:27
  • Maybe yes, but I use Elasticsearch and I will not use another database system :( – datasciencebegginer Dec 04 '20 at 09:29

2 Answers2

5

With GeoPandas

import geopandas as gpd
gdf = gpd.read_file('/home/gis/polskagisencoding.shp')
gdf["Area"] = gdf.geometry.area

But I'm not sure it's faster with larger shapefiles

martinfleis
  • 2,407
  • 11
  • 17
gene
  • 54,868
  • 3
  • 110
  • 187
5

For large datasets like this, you may want to use dask-geopandas. It is still under development (no official release yet) but area should work flawlessly.

You should install geopandas, dask and pygeos and then dask-geopandas from git.

pip install git+git://github.com/jsignell/dask-geopandas.git

Then you can read your file with geopandas and convert it to dask.dataframe.

import geopandas
import dask_geopandas

df = geopandas.read_file('/home/gis/polskagisencoding.shp')

ddf = dask_geopandas.from_geopandas(df, npartitions=4)

areas = ddf.geometry.area.compute()

npartitions in this case should be the number of processors you want to use. dask-geopandas then does the computation in parallel.

See more here https://github.com/jsignell/dask-geopandas

martinfleis
  • 2,407
  • 11
  • 17