ArcView 3.x Avenue Bitmaps (Tabs?) vs. ArcView 10 Python Cursors

Question

Note: While this question has an answer, any further tips for optimizing a cursor process would be greatly appreciated. I will be monitoring for any updates.

Currently, my boss (who works in Avenue) and I (working in Python) are both attempting to solve the same problem. Rather, we have both solved it, but the speed at which our solutions operate are... disjointed, to say the least. What his script processes in 2 hours can take mine up to 6. The only real difference in syntax and implementation in logic comes from 3.x's Bitmaps and 10.x's Cursors. We both:

1) Store values from Table 1.
2) Use those values to query a row in Table 2.
3) Store values from Table 2 for insertion into Table 3 as a new row.

In both scripts, these processes are completed in two nested loops. Before I start digging into the wonderful world of code optimization, is this an expected occurrence when comparing Avenue script performance to Python? This is not the first time his scripts have greatly outperformed mine in terms of operation time, so I would like to know if there is something I should be aware of before I crucify myself for horrid scripting.

Here's my script sans extraneous bits:

import arcpy
import time
import sys
import os

def recordfindcopy(inFile,query,outFile):
    findRecord = arcpy.SearchCursor(inFile,query)
    for record in findRecord:
        copyRecord = arcpy.InsertCursor(outData) # <--- D'oh! (See answer)
        field = record.FIELD
        copy = copyRecord.newRow()
        copy.FIELD = field
        copyRecord.insertRow(copy)

StreetsFileList = [r"Path", 
                r"Path"]

for sfile in StreetsFileList:
    inStreets = sfile
    inTable = r"Path"
    outData = r"Path"
    fsaEntry = arcpy.SearchCursor(inTable)
    for row in fsaEntry:
        id = row.ID
        sQuery = "ID = %s " % (str(id))
        recordfindcopy(inStreets,sQuery,outData)

EDIT: Given some of the comments so far, I wonder if there might be a better way to do this via joins, although I am dubious given the brobdingnagian (word of the day!) size of the tables. The heart of the processing is to append information from one table to any matching records in a second table and create a third table only containing the important fields. I wanted to try this using SDE, but that appears to not be an available option. Thoughts? I apologize if my questions are always so involved, but I'm trying to get to the bottom of a long-standing annoyance.

Answered: Jakub's simple suggestion alone decreased the processing time from 30 seconds per 500 records to 3 seconds per 500 records. Re-initiating the insert cursor on every insert slowed things down considerably (obviously). While this may not be the most optimization one can do for this process when put up against ArcView 3.x's speed, it is enough for my purposes at this time. Further suggestions are very welcome!

Feel like posting your script? I don't know of any avenue/python using GP benchmarks. — Derek Swingley, Apr 21 '11 at 20:58
Table Joins and Queries are much faster in old ArcView 3.2 (avenue) than any ArcGIS 8.x to 10.* arcpy/python. basically due to the amount (much more) of code in ArcGIS products. — Mapperz, Apr 21 '11 at 21:07
@Mapperz You are quite right. However, row-by-row processing in ArcView 3.x is hideously slow due to the 10,000X interpretive overhead for each request (I have benchmarked this). When one can avoid loops--using "high level" requests like joins and queries as you suggest--ArcView 3.x will beat the pants off ArcGIS, but it's plausible that in a head-to-head test involving explicit loops over records, either one could win out by a relatively slight margin. — whuber, Apr 21 '11 at 21:10

score 5 · Answer 1 · edited Apr 13 '17 at 12:34

I'll assume you're using ArcPy, or arcgisscripting circa 9.3. Either way the techniques here will speed up your processing....maybe better than your bosses.

The first thing is performing look up and insertions with any medium other than memory is going to slow your processes. Avenue is optimized to work quickly, and uses a C\C++ (correct me if I'm wrong) code base which is inherently quicker at IO than most other languages. Python is quick too (just as quick) except where there are overheads in hooking into c libraries to perform operations, such as ArcPy or arcgisscripting.

So try this first:
1. Copy the tables you need to use into memory using the methods -

gp.CopyFeatures("Path to featureclass\FeatureclassName", "'in_memory'\FeatureclassName") - for feature classes and;
gp.CopyRow("Path to featureclass\FeatureTableName", "'in_memory'\FeatureTableName") - for tables in to a 'in_memory' feature class or table.

This will allow you to use memory like RAM disk, and save you a lot of disk thrashing. You can also create a feature class or table in memory by substituting the FeatureDataset parameter with 'in_memory'.

Use python containers as much as possible. This will also increase speed.

Finally the order for efficiency in reading and writing information for ESRI formats is

Shapefile (sad but true)
Personal Geodatabase
File Geodatabase
ArcSDE (even with direct connect it's slower)

Give these suggestions a try, as I'm trying to compile a list of things that work here on gis.stackexchange.com see here

The memory option seems useful, but the combined might of the table I am querying against clocks in at almost 1 gb. I believe I have enough RAM to make this possible, but will the sheer size of the table risk a violent crash? Also, what is a python container? — Nathanus, Apr 21 '11 at 22:10
I'm surprised you place personal gdb as faster than file gdb, as that directly inverted from my experience. It would be interesting to explore that somewhere/time. — matt wilkie, Apr 21 '11 at 22:10
It may be the process I'm currently working with, but I've found that a file gdb is slower, but only just. I'd say they are on par, and I'd choose a file gdb over personal gdb purely because of file limitations. I'm very interested in devising a benchmark for this. Are you interested in helping me define some tests? — OptimizePrime, Apr 22 '11 at 17:53
I tried putting the shapefile in memory, and that seemed to do very little to help... indeed, the script stopped processing shortly thereafter. — Nathanus, Apr 22 '11 at 22:17

score 3 · Answer 2 · edited Apr 13 '17 at 12:33

3

I bet it's not that Avenue is faster than Python, but that ArcView3 is faster than ArcGIS (at what you are trying to do).

Since from the sound of it this is essentially a non-spatial exercise you may want to experiment with accessing the database tables directly (e.g. don't use arcpy) with something like dbfpy or odbc (haven't tried either of them myself). Personally I've found commandline ogr2ogr of the gdal/ogr suite to be orders of magnitude faster than equivalent transactions in arcgis. I've only dipped lightly into OGR query abilities though, and I haven't built anything using only the python bindings so I don't know if that speed carries over.

edited Apr 13 '17 at 12:33

Community

1

answered Apr 21 '11 at 22:35

matt wilkie

28,176
35
147
280

The only rub here is that I am appending non-spatial data to spatial data. I.E. I'm taking the Shape field along with a few others and creating a new record that will contain the geometry and the additional non-spatial data. Will dpfpy and odbc account for moving Shapes fields (and their geometry)? – Nathanus Apr 23 '11 at 19:16
It wouldn't work with shapefiles as Shape isn't stored in the .dbf. Theoretically it could work with a personal geodatabase (.mdb) using odbc but I'm leery of that approach, especially since there's already a proven route with OGR, which already knows both shapefile and personal gdb. – matt wilkie Apr 26 '11 at 17:05

score 2 · Accepted Answer · answered Apr 22 '11 at 13:24

2

I am not new to programming but very new to Python so take this with a grain of salt...

copyRecord = arcpy.InsertCursor(outData)

Should not the insert cursor be set prior to the For Next loop? Seems to me that if the path to the "out" data is stored in the "outData" variable then then it does not need to be reset every time you iterate. I would think that this should speed things up significantly.

answered Apr 22 '11 at 13:24

Jakub Sisak GeoGraphics

20,294
20
104
180

Good catch. I will give that a try when I'm back in the office next week. – Nathanus Apr 22 '11 at 16:56

score 1 · Answer 4 · answered Apr 22 '11 at 14:30

1

This is not an especially helpful answer at the moment, but wait for ArcGIS 10.1. At this year's esri dev summit we were told that arcpy 10.1 cursor support has been completely rewritten and is significantly faster. During the plenary there was a claim of speed improvements of around 8x.

answered Apr 22 '11 at 14:30

jburka

366
1
5

Thanks for the information. Something to look forward to, if nothing else. – Nathanus Apr 22 '11 at 22:18

ArcView 3.x Avenue Bitmaps (Tabs?) vs. ArcView 10 Python Cursors

4 Answers4