5

I have a geodatabase that contains a large number of raster datasets - there are two rasters for each country in the world, and they are very large, in some cases up to 3GB per raster dataset.

I need to convert these to NumPy arrays so as to run an analytical program that I wrote on them, but I'm hitting a memory error when my script tries to convert the first raster dataset. The script creates a list of all of the rasters in the dataset (rasterList), then grabs the country codes at the end of each raster dataset name and populates a new list of all the country codes. I then delete rasterList, and iterate through the country codes, converting rasters to either 'GLUR_'+code or 'Pop00_'+code. I delete each raster from memory after it is converted, but the program doesn't even get that far...

The code is:

import arcpy, os
import numpy as np
from arcpy import env
env.workspace = "J:/Data/CISC.gdb"

filepath = "J:/LIVE/"
rasterList = arcpy.ListRasters("G*")
codeList = []

for raster in rasterList:
    codeList.append(raster[5:])

del rasterList

for code in codeList:
    print code
    inputGLUR = arcpy.Raster("J:/Data/CISC.gdb/GLUR_"+code)
    lowerLeft = arcpy.Point(inputGLUR.extent.XMin, inputGLUR.extent.YMin)
    cellSize = inputGLUR.meanCellWidth
    print "Converting GLUR_"+code, "to NumPy array"
    arr = arcpy.RasterToNumPyArray(inputGLUR, nodata_to_value=0)
    os.chdir("J:/LIVE/NumPy_GLUR")
    numpy.save("GLUR_"+code+".npy", arr)
    print "GLUR Converted"
    del inputGLUR
    del arr

    inputPop = arcpy.Raster("J:/Data/CISC.gdb/Pop00_"+code)
    lowerLeft = arcpy.Point(inputPop.extent.XMin, inputPop.extent.YMin)
    cellSize = inputPop.meanCellWidth
    print "Converting Pop00_"+code, "to NumPy array"
    arr = arcpy.RasterToNumPyArray(inputPop, nodata_to_value=0)
    os.chdir("J:/LIVE/NumPy_Pop")
    numpy.save("Pop00_"+code, arr)
    print "Population array converted"
    del inputPop
    del arr

and yields this error:

Traceback (most recent call last):
  File "J:\PYTHON\RasterToNumPy.py", line 21, in <module>
    arr = arcpy.RasterToNumPyArray(inputGLUR, nodata_to_value=0)
  File "C:\Program Files (x86)\ArcGIS\Desktop10.3\ArcPy\arcpy\__init__.py", line 2244, in RasterToNumPyArray
    return _RasterToNumPyArray(*args, **kwargs)
MemoryError

This is on a computer with an i7 processor and 16GB RAM. The data is stored on an external disk. Any ideas on how to handle rasters of this size?

GabeFS
  • 91
  • 2
  • 6
  • 2
    You might try to have a look at Numpy memmap which is a memory-mapped file that behaves similar to a Numpy array. Have a look at the memmap doc. You will need to load your files in several steps into a Numpy array with the arcpy.RasterToNumPyArray function in order to avoid the out-of-memory error, and each time update your memmap accordingly – chkaiser Aug 25 '15 at 17:37
  • Do you have 64bit background geoprocessing installed? Using that will allow you to use 64bit Python, which would allow more memory usage. – Evil Genius Aug 25 '15 at 17:38
  • @chkaiser Thanks, I'll look into memmap and will update the question if I find my answer there! – GabeFS Aug 25 '15 at 18:00
  • @EvilGenius I do have 64bit background geoprocessing installed, and have been running the program in a 64bit python installation - that's where my mind went first, too! – GabeFS Aug 25 '15 at 18:01
  • If not memmap, try using a buffer to only read a manageable amount of rasters at a time (and iterate). – Clay Aug 25 '15 at 18:28
  • Is converting raster to ASCII an option? It is much slower, but produces very nice text file that you can read and transform. Another option is playing with lower left point and import parts of raster at a time – FelixIP Aug 26 '15 at 21:40
  • Hi @GabeFS, how did you do with this problem? did you find a solution? I have a project stuck with the same problem. Large rasters. I can tell that memmap (Numpy) is the solution, but I couldn't find the proper way to do this. In Numpy, is how to define the windows or pieces of raster is where I'm stuck. Can you give an update on how did you solve this? Thanks. – Delonix R. Apr 18 '16 at 18:45
  • Hi @DelonixR. -- I ended up getting through it without using memmap. I exported the rasters to geotiffs in ArcCatalog. Then I used the SciPy image handling to convert them into numpy arrays. Code is below in the answer. – GabeFS Apr 20 '16 at 17:30
  • For an example of how to do this using memmap, take a look at this question: http://gis.stackexchange.com/a/214659/82250 – Devin Cairns Oct 18 '16 at 18:11

1 Answers1

3

For anyone who finds this and has a similar question, I found a useful answer in this related question.

What worked for me was exporting the large raster datasets from the geodatabase to .tif files, and then using the SciPy image reader to render them as NumPy arrays:

import scipy
from scipy import misc
raster = misc.imread("myraster.tif")
np.save("myarray.npy", raster)

Hope that is helpful to anyone stuck on a similar problem!

GabeFS
  • 91
  • 2
  • 6