5

I need use the ArcGIS REST API to query my ArcGIS Server services, the where parameter is like ObjectID in (1, 2, 3,...,2000). When the number of object ids is larger than 1000, the returned result shows "Unable to complete operation".

Vince
  • 20,017
  • 15
  • 45
  • 64
Liu Guang
  • 63
  • 1
  • 5
  • This is a hideously expensive query mechanism. The "Doctor's Advice" applies to this situation ("Doctor, doctor! It hurts when I do this" "Don't do that.") There's also a limit on the number of features returned because that, too, is inefficient (large and slow). Rather than ask how to go against best practice, you really ought to be asking about the problem to which this use pattern is your solution (aka XY Problem ) – Vince Jan 04 '18 at 12:52

3 Answers3

3

Depending on the type of database, there may be a limit on the amount of items that can be inside the IN clause. I think Oracle has a limit of 1000.

Two things you could try are:

  • Instead of using where, use objectids=1,2,3, ... in the REST call
  • Split the list of objectids in groups of 1000: where=objectid in (1,2, ... ,1000) OR objectid in (1001,1002, ... ,2000)

UPDATE

The above solution will fix the error message, but you'll still hit the 1000 feature limit imposed by ArcGIS Server. 1000 features is the default in current versions, in older version it used to be 500. To get more than 1000 features, you need to change a setting in ArcGIS Server, or do multiple requests in batches of 1000.

You should also ask yourself if you really need to get all features. There are valid use-cases to do so, but it would help if you tell us why you think you need to.

Berend
  • 4,737
  • 13
  • 27
  • 1
    I see this answer is accepted but it just seems not true what-so-ever. I don't really want to downvote it as its good try, just not correct.

    The 1000 record limit is an generic default applied to all ArcGIS Server services. It has nothing to do with IN clauses or Oracle or anything database. Its just the default that services are born with. Publishers can change this to be what they like - I often bump mine to 10,000.

    Note there are solid reasons for publishers to limit their maximum record counts as memory is always dear and converting features to JSON is expensive.

    – pauldzy Jan 05 '18 at 15:45
  • 1
    @pauldzy There is indeed also the AGS 1000 feature limit, but that wouldn't case the error message. It will just give you 1000 features. This generic error is in most cases caused by the underlying database. I will update my answer shortly. – Berend Jan 08 '18 at 08:42
  • Hmm, alrighty I concede your point. Apologies for the aspersions. It just seemed wrong that AGS is passing the where clause as a literal to the backend but perhaps it does or perhaps it depends on the backend being used. I'll try this with a fgdb backend and see what happens. – pauldzy Jan 09 '18 at 13:07
3

I know this is a somewhat older question, but I just wanted to share how I've been getting around these limitations from ArcGIS services using a Python script and the GeoPandas library.

import numpy as np
import pandas as pd
import geopandas as gpd
import urllib.parse
import requests

def query_arcgis_feature_server(url_feature_server=''): ''' This function downloads all of the features available on a given ArcGIS feature server. The function is written to bypass the limitations imposed by the online service, such as only returning up to 1,000 or 2,000 featues at a time.

Parameters
----------
url_feature_server : string
    Sting containing the URL of the service API you want to query. It should 
    end in a forward slash and look something like this:
    'https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Counties/FeatureServer/0/'

Returns
-------
geodata_final : gpd.GeoDataFrame
    This is a GeoDataFrame that contains all of the features from the 
    Feature Server. After calling this function, the `geodata_final` object 
    can be used to store the data on disk in several different formats 
    including, but not limited to, Shapefile (.shp), GeoJSON (.geojson), 
    GeoPackage (.gpkg), or PostGIS.
    See https://geopandas.org/en/stable/docs/user_guide/io.html#writing-spatial-data
    for more details.

'''
if url_feature_server == '':
    geodata_final = gpd.GeoDataFrame()
    return geodata_final

# Fixing last character in case the URL provided didn't end in a 
# forward slash
if url_feature_server[-1] != '/':
    url_feature_server = url_feature_server + '/'

# Getting the layer definitions. This contains important info such as the 
# name of the column used as feature_ids/object_ids, among other things.
layer_def = requests.get(url_feature_server + '?f=pjson').json()

# The `objectIdField` is the column name used for the 
# feature_ids/object_ids
fid_colname = layer_def['objectIdField']

# The `maxRecordCount` tells us the maximum number of records this REST 
# API service can return at once. The code below is written such that we 
# perform multiple calls to the API, each one being short enough never to 
# go beyond this limit.
record_count_max = layer_def['maxRecordCount']

# Part of the URL that specifically requests only the object IDs
url_query_get_ids = (f'query?f=geojson&returnIdsOnly=true'
                     f'&where={fid_colname}+is+not+null')

url_comb = url_feature_server + url_query_get_ids

# Getting all the object IDs
service_request = requests.get(url_comb)
all_objectids = np.sort(service_request.json()['properties']['objectIds'])

# This variable will store all the parts of the multiple queries. These 
# parts will, at the end, be concatenated into one large GeoDataFrame.
geodata_parts = []

# This part of the query is fixed and never actually changes
url_query_fixed = ('query?f=geojson&outFields=*&where=')

# Identifying the largest query size allowed per request. This will dictate 
# how many queries will need to be made. We start the search at
# the max record count, but that generates errors sometimes - the query 
# might time out because it's too big. If the test query times out, we try 
# shrink the query size until the test query goes through without 
# generating a time-out error.
block_size = min(record_count_max, len(all_objectids))
worked = False
while not worked:
    # Moving the "cursors" to their appropriate locations
    id_start = all_objectids[0]
    id_end = all_objectids[block_size-1]

    readable_query_string = (f'{fid_colname}>={id_start} '
                             f'and {fid_colname}<={id_end}')

    url_query_variable =  urllib.parse.quote(readable_query_string)

    url_comb = url_feature_server + url_query_fixed + url_query_variable

    url_get = requests.get(url_comb)

    if 'error' in url_get.json():
        block_size = int(block_size/2)+1
    else:
        geodata_part = gpd.read_file(url_get.text)

        geodata_parts.append(geodata_part.copy())
        worked = True

# Performing the actual query to the API multiple times. This skips the 
# first few rows/features in the data because those rows were already 
# captured in the query performed in the code chunk above.
for i in range(block_size, len(all_objectids), block_size):
    # Moving the "cursors" to their appropriate locations and finding the 
    # limits of each block
    sub_list = all_objectids[i:i + block_size]
    id_start = sub_list[0]
    id_end = sub_list[-1]

    readable_query_string = (f'{fid_colname}>={id_start} '
                             f'and {fid_colname}<={id_end}')

    # Encoding from readable text to URL
    url_query_variable =  urllib.parse.quote(readable_query_string)

    # Constructing the full request URL
    url_comb = url_feature_server + url_query_fixed + url_query_variable

    # Actually performing the query and storing its results in a 
    # GeoDataFrame
    geodata_part =  (gpd.read_file(url_comb, 
                                   driver='GeoJSON'))

    # Appending the result to `geodata_parts`
    if geodata_part.shape[0] > 0:
        geodata_parts.append(geodata_part)

# Concatenating all of the query parts into one large GeoDataFrame
geodata_final = (pd.concat(geodata_parts, 
                           ignore_index=True)
                 .sort_values(by=fid_colname)
                 .reset_index(drop=True))

# Checking if any object ID is missing
ids_queried = set(geodata_final[fid_colname])
for i,this_id in enumerate(all_objectids):
    if this_id not in ids_queried:
        print('WARNING! The following ObjectID is missing from the final '
              f'GeoDataFrame: ObjectID={this_id}')
        pass

# Checking if any object ID is included twice
geodata_temp = geodata_final[[fid_colname]].copy()
geodata_temp['temp'] = 1
geodata_temp = (geodata_temp
                .groupby(fid_colname)
                .agg({'temp':'sum'})
                .reset_index())
geodata_temp = geodata_temp.loc[geodata_temp['temp']>1].copy()
for i,this_id in enumerate(geodata_temp[fid_colname].values):
    n_times = geodata_temp['temp'].values[i]
    print('WARNING! The following ObjectID is included multiple times in'
          f'the final GeoDataFrame: ObjectID={this_id}\tOccurrences={n_times}')

return geodata_final

You just need to use the query_arcgis_feature_server function defined above and it will download all of the data on the feature server. It takes care of a couple of pitfalls, such as avoiding the maximum feature limit, or requests that are too big and just time out.

Here's an example of how to download the data and store it into a shapefile on your disk:

url = 'https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Counties/FeatureServer/0/'

usa_counties_gdf = query_arcgis_feature_server(url)

usa_counties_gdf.to_file('usa_counties.shp')

Felipe D.
  • 2,509
  • 2
  • 14
  • 32
  • 1
    Something tells me that this answer is the one most people finding this page will want. – ss7 May 17 '22 at 17:59
1

I was also facing the limit of being able to request only 1,000 ObjectIds and their respective geometries/polygons from a FeatureServer to a GeoDataFrame. This is how I overcame this problem: I first check the count of ObjectIds for a given FeatureServer layer, then I request the ObjectIds/Geometries in multiple batches and concatanate them into one single GeoDataFrame. Please note that this solution requires Python 3.12+ because of the itertools.batched function and that large size queries result might be a problem, as described here.

# Import packages
from datetime import datetime
from io import BytesIO
from itertools import batched
import urllib.parse

import geopandas as gpd import pandas as pd import requests

Settings

requests.packages.urllib3.disable_warnings( requests.packages.urllib3.exceptions.InsecureRequestWarning, )

Functions

def get_arcgis_token( *, url_portal, username, password, client='referer', expiration=60, ): """Generate a ArcGIS Server Token from your account - https://<Your Server Name>/portal/sharing/rest/generateToken""" # Query attributes headers = {'content-type': 'application/x-www-form-urlencoded'}

parameters = {
    'username': username,
    'password': password,
    'client': client,
    'referer': url_portal,
    'expiration': expiration,
    'f': 'json',
}

url = url_portal + '/sharing/generateToken'

request = requests.post(
    url=url,
    data=parameters,
    headers=headers,
    verify=False,
)
token_arcgis = request.json()['token']

# Return objects
return token_arcgis


def get_geometry_count( *, url_feature_server, spatial_reference=4326, token_arcgis=None, format='pjson', ): """Get the count of geometries for a given layer.""" # Query attributes params = { 'where': '1=1', 'geometryType': 'esriGeometryEnvelope', 'inSR': spatial_reference, 'spatialRel': 'esriSpatialRelIntersects', 'units': 'esriSRUnit_Meter', 'returnDistinctValues': 'false', 'returnGeometry': 'true', 'returnIdsOnly': 'false', 'returnCountOnly': 'true', 'returnTrueCurves': 'false', 'token': '', 'f': format, }

if token_arcgis is not None:
    params.update({'token': token_arcgis})

url = url_feature_server + 'query?' + urllib.parse.urlencode(params)

response = requests.get(url=url, verify=False)

# Return objects
return response.json()['count']


def get_geometry( *, url_feature_server, spatial_reference=4326, out_fields='OBJECTID', token_arcgis=None, format='GeoJSON', server_limit=1000, ): """Get the geometries for a given peril. Note that queries might get large in size and this should be changed in the ArcGIS Server Administrator Directory settings as described here: https://support.esri.com/en-us/knowledge-base/error-error-performing-query-operation-000011736""" # Create variables execution_start = datetime.now()

# Get the count of geometries for a given layer
geometry_count = get_geometry_count(
    url_feature_server=url_feature_server,
    spatial_reference=spatial_reference,
    token_arcgis=token_arcgis,
    format='pjson',
)
print(f'Geometry count: {geometry_count}')

# Create empty GeoDataFrame
df_geo = gpd.GeoDataFrame(
    data=None,
    index=None,
    columns=None,
    dtype=None,
    geometry=None,
    crs=None,
)

# Request the geometries in batches given the server limit
for batch in batched(iterable=range(geometry_count), n=server_limit):
    # Query attributes
    params = {
        'where': 'OBJECTID BETWEEN {} AND {}'.format(
            min(batch) + 1,
            max(batch) + 1,
        ),
        'geometryType': 'esriGeometryEnvelope',
        'inSR': spatial_reference,
        'spatialRel': 'esriSpatialRelIntersects',
        'units': 'esriSRUnit_Meter',
        'outFields': out_fields,
        'returnDistinctValues': 'false',
        'returnGeometry': 'true',
        'returnIdsOnly': 'false',
        'returnCountOnly': 'false',
        'returnTrueCurves': 'false',
        'token': '',
        'f': format,
    }

    if token_arcgis is not None:
        params.update({'token': token_arcgis})

    url = url_feature_server + 'query?' + urllib.parse.urlencode(params)

    response = requests.get(url=url, verify=False)

    df_geo_part = gpd.read_file(
        filename=BytesIO(response.content),
        driver='GeoJSON',
    )
    df_geo_part = df_geo_part.to_crs(f'EPSG:{spatial_reference}')

    # Concatanate to GeoDataFrame
    df_geo = pd.concat(
        objs=[df_geo, df_geo_part],
        axis=0,
        ignore_index=True,
        sort=False,
    )

    # Remove objects
    del params, url, response, df_geo_part

# Rename columns
df_geo = df_geo.rename(columns={'OBJECTID': 'object_id'})

# Execution time
execution_time = datetime.now() - execution_start
print(f'Execution time: {execution_time}')

# Return objects
return df_geo

To run the functions above, the parameters need to be changed, e.g.:

# Run Functions
token_arcgis = get_arcgis_token(
    url_portal='https://<Your Server Name>/portal',
    username='<Username>',
    password='<Password>',
    client='referer',
    expiration=1440,
)
print(token_arcgis)

df_geo = get_geometry( url_feature_server='https://<Your Server Name>/FL_DEF_202001_VIS/FeatureServer/0/', spatial_reference=4326, out_fields='OBJECTID', token_arcgis=token_arcgis, format='GeoJSON', server_limit=1000, ) print(df_geo.plot())

roboes
  • 111
  • 2