4

I have a relatively large result file from my network analysis in ArcGIS 10.5. Data represents 100 closest neighbors for ~1.5 mio locations. The dataset is stored in a file geodatabase and it contains 'no geography' since I specified NO_LINES option in MakeODCostMatrixLayer_na tool so I treat it as a table.

I would like to move this data out of GIS for further processing in Stata (and or R). Since Stata is not GIS.. and I also wanted to avoid shapefile for data of this size I thought first about using txt or csv file as an intermediate way.

Learning from previous solutions, I tried three solutions:

  1. Exporting via GUI in ArcCatalog
  2. Exporting via (64-bit) Python script with 'Export Feature Attribute to ASCII' (Spatial Statistics toolbox)
  3. Exporting via Python script with 'Table To Table' (Conversion toolbox)
  4. Exporting via Python script with 'Copy rows' (Data Management toolbox)
  5. Exporting via Python script with a cursor loop (as suggested by Caleb1987 here)

Oddly enough, all these solutions give me a resulting file with 30 mio observations (exactly the same number) instead of 150 mio.

What can be causing a problem like that? Are there any better solutions for exporting files of this size?

Note: all scripts were tested with smaller datasets and everything worked smoothly so I am pretty confident there are no mistakes there.


Update: I managed to test one more solution via ogr2ogr. Using command line to connect to database and export to CSV:

ogr2ogr -f "CSV" "path\output.csv" "path\database.gdb" "feature_class"

This time I get to export 47mio records and the tool fails with many error messages similar to:

ERROR 1: Error occurred in filegdbtable.cpp at line 1284
PolyGeo
  • 65,136
  • 29
  • 109
  • 338
radek
  • 10,579
  • 12
  • 86
  • 121
  • Are you getting the first fifth of the records as your results? – danak Jun 27 '17 at 15:17
  • @danak Yes these seem to be the 'top 20%' records from the result file – radek Jun 27 '17 at 15:23
  • 2
    If you do Select All, does the selection count =record count? I have seen many database tools fail to search or select all the records when the count is extremely high. You may have to break it up into a few batches. – danak Jun 27 '17 at 15:24
  • How many rows are there in total? I belive there is a limit for the output rows 6553. try this: https://www.arcgis.com/home/item.html?id=f3d91b8f852042e289e09a7ec8342431 – NULL.Dude Jun 27 '17 at 15:30
  • @danak When in table view in ArcCatalog I get correct number of features. ogrinfo returns the same value. – radek Jun 27 '17 at 15:38
  • @Joe as described in question - there are 150mio rows in the table. What is Excel and CSV Conversion Tools using for reading/writing files? – radek Jun 27 '17 at 15:40
  • @radek I am not sure what its using (assuming python) but I read it bypasses the output limit. give it a try and see if it works on a copy of the data. if not your going to have to break your data down into smaller chunks – NULL.Dude Jun 27 '17 at 15:54
  • Have you tested your theory that there may be a 30 million row limit to this export by dummying up 35 million rows and trying to export that? – PolyGeo Jun 27 '17 at 22:11
  • You could try to read the feature class directly from R, see: https://gis.stackexchange.com/questions/151613/how-to-read-feature-class-in-file-geodatabase-using-r – BERA Jun 28 '17 at 09:32
  • @Joe The tool you link to does not want to use as input feature classes nor tables from file gdb. – radek Jun 28 '17 at 10:16
  • @PolyGeo Thanks - this is always an option and trying it now with some simple Python loop. Problem is - next time I can have few more variables and maybe it will be 20mio instead of 30.. So was trying to understand a reason of the failure on higher level. In a world of statistical software 150mio is not considered to be huge dataset that crashes software any longer. Was tad surprised that GIS trips on that o_O – radek Jun 28 '17 at 10:18
  • @BERA Thaks for the pointer. I did try ogr2ogr from command line so I presume that kills this solution.. – radek Jun 28 '17 at 10:20
  • @radek I'll be surprised if there is a 30 million row limit - that's why I think you should test your assertion that there might be. – PolyGeo Jun 28 '17 at 10:21
  • Are now you suspecting this to be a file geodatabase, ArcGIS Desktop and/or OGR limitation? In any event I think you may be better to separate your ArcGIS and OGR investigations into separate questions. – PolyGeo Jul 05 '17 at 23:09
  • Thanks @PolyGeo. I run out of options and am now trying to work with smaller subsets of data and some loops. Not optimal solution but all works reasonably well so far with ~1/6 of the origins data. I'm not OGR expert and and don't have any other ideas so I think I'll refrain from starting new question. – radek Jul 06 '17 at 07:27

3 Answers3

1

According to this thread the problem might be prevented by using the arcgis excel and csv toolbox . It can convert your table to csv, avoiding external max rows limitations.

NettaB
  • 1,936
  • 1
  • 10
  • 19
0

According to ArcGIS Desktop's File geodatabase size and name limits documentation you should not be getting close to its row number limitation:

Number of rows in a feature class or table: 2,147,483,647

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
  • Thanks @PolyGeo. Indeed it doesn't look like the 150 mio should be a limit. I think this is somehow confirmed by the fact that scripts creating this large file run without errors.. Similarly ogrinfo returns expected number of features. I understand that the issue is really in the export itself.. – radek Jul 06 '17 at 07:30
0

You can try using Pandas which comes with ArcMap 10.5 installation:

import arcpy
import pandas as pd

fc = r'C:\database.gdb\feature_class'

fields = [f.name for f in arcpy.ListFields(fc)]

df = pd.DataFrame.from_records(data=arcpy.da.SearchCursor(fc,fields), columns=fields)

df.to_csv(path_or_buf=r'C:\folder\outfile.csv', sep=',')
BERA
  • 72,339
  • 13
  • 72
  • 161
  • Thanks @BERA. Have you managed to test this solution with such large dataset? I currently do not have access to the datasets that I struggled with when I asked this question so am not able to comment more. – radek Feb 12 '18 at 03:56
  • No i have not. Should work if you have enough RAM to load all data into a pandas dataframe. – BERA Feb 12 '18 at 06:12