2

I have a shapefile with almost 50,000 features and 2 fields. I want to change all the attribute values of the second field, and I remember that in QGIS 2 there was a way to save to disk ONLY at the end of the iteration.

1st way

layer_provider = layer.dataProvider()
layer.startEditing()
for f in features: 
    id = f.id()
    avalue = f.attributes()[0] /10
    attr_value = {1:avalue}
    layer_provider.changeAttributeValues({id:attr_value})
layer.commitChanges()
layer.updateFields()

2nd way

with edit(layer):
    for feature in layer.getFeatures():
        feature['DN'] = feature['DN10']/10
        layer.updateFeature(feature)

Please, tell me which way saves just once the shapefile?

Tried also:

start_time = time.time()
print('Working..........')
with edit(layer):
    for feature in layer.getFeatures():
        if feature['DN10'] > 0:
            feature['DN'] = feature['DN10']/10
            layer.updateFeature(feature)
        else:
            feature['DN'] = 0
            layer.updateFeature(feature)
elapsed_time = (time.time() - start_time) / 60
print('End processing in '+ str(elapsed_time) + ' min')

I remember, however, that there was a way to save to sourcefile just once, after the end of iteration. I've used it some 8 years ago in a plugin I made, but I can't find it

New try with:

start_time = time.time()
print('Working..........')
field_idx = layer.fields().indexOf('DN')
with edit(layer):
    for feature in layer.getFeatures():
        if feature['DN10'] > 0:
            new_value = feature['DN10']/10
            layer.changeAttributeValue(feature.id(), field_idx, new_value)
        else:
            new_value = 0
            layer.changeAttributeValue(feature.id(), field_idx, new_value)
elapsed_time = (time.time() - start_time) / 60
print('End processing in '+ str(elapsed_time) + ' min')

Nothing, never finished

Sotiriadou
  • 29
  • 2

1 Answers1

1

You are mixing some concepts here.

  • The layer. [QgsVectorLayer][1]. Is the representation of a source of vector data in the map canvas. A QgsVectorLayer object can not be directly modified without open an edition buffer

  • The provider or data source. [QgsVectorDataProvider][2]. All layers have a data provider, it's responsibility is make transparent for the layer how to get the data from the source (Shapefile, PostGIS, ...). The provider of the layer can be obtained with layer.dataProvider(), and the provider allows to modifiy (and persist) directly the data source without open an editing buffer.

Your first example is not very correct, because you are open an editing buffer but then you are updating directly the provider. It will make more sense doing in this way:

layer.startEditing()
for f in features: 
    feature['DN'] = feature['DN10']/10
    layer.updateFeature(feature)
layer.commitChanges()

and the previous code is more or less identically to:

with edit(layer):
    for feature in layer.getFeatures():
        feature['DN'] = feature['DN10']/10
        layer.updateFeature(feature)

What they do are:

  • Open the layer for editing (like press the equivalent button in QGIS UI)
  • Made some modifications on the layer (what you see in the canvas)
  • Persists all the changes at the same time changes to the underlying datasource (shapefile, PostGIS, ...)

The main difference between (1) and (2) is that in (1) you must handle the errors (and the rollback()) of the changes to the edition buffer by hand. I mean, if an error is raised after startEditing but before commitChanges the layer will remain in edition mode. You must wrap up the code in a try-except blog or something like that.

Method (2) is called a [context manager][3] in Python and it ensures that if an exception is raised, the layer will be correctly closed and resources liberated. So it's the preferred method.

To directly persist the modifications one by one to the source this will be enougth:

layer_provider = layer.dataProvider()
for f in features: 
    id = f.id()
    avalue = f.attributes()[0] /10
    attr_value = {1:avalue}
    layer_provider.changeAttributeValues({id:attr_value})

And AFAIK updateFields is only needed when you change the columns of the layer, like adding or deleting a new column/field. Is not needed when you are only modifying the data.

So as resume:

  • In your "1st way" you are persisting the changes for each feature in each iteration
  • In your "2nd way" you are persisting all the changes at the same time and the end of the iteration, and it's the recommended way of doing it.

Also, if performance is a problem, and there is only the need to change one attribute value and not the geometry, as stated in the documentation of [updateFeature][4], using [changeAttributeValue][5] will be faster. Alson using the index of the attribute instead of the name should be also faster.

idx_for_DN_attribute = 0
with edit(layer):
    for feature in layer.getFeatures():
        old_value = feature[idx_for_DN_attribute]
        new_value = old_value / 10 if old_value > 0 else 0
        layer.changeAttributeValue(feature.id(), idx_for_DN_attribute, new_value)

Francisco Puga
  • 4,618
  • 21
  • 40
  • Thanks for your thorough answer, however the question was "how to save to source JUST ONCE, AFTER the end of the iteration – Sotiriadou Jul 12 '21 at 08:26
  • I edit the answer, hope it is clear now. – Francisco Puga Jul 12 '21 at 09:11
  • It is very clear, thank you! I omitted a zero, my features are about 500,000 so the 2nd way is the only way for me. – Sotiriadou Jul 12 '21 at 09:24
  • Nice it works for you. Please, remember to accept the answer if you think it is the case. https://gis.stackexchange.com/help/someone-answers – Francisco Puga Jul 12 '21 at 13:19
  • Sorry, I tried the code, didn't finish after 2 hours, HHD led always flashing. Manually I can do it in under 1 min. The code I tried is: – Sotiriadou Jul 12 '21 at 14:41
  • 1
    Thanks so much for the time you spent helping me. I'll try to find my old plugin to see if I can remember what I've done there. – Sotiriadou Jul 12 '21 at 16:20