1

All the articles I'm finding including GDAL or PIL(?) which I cannot use. So I am trying to convert a raster I have to a Numpy Array with Arcpy and Numpy and then calculate some statistics on it but running into an error. Not much documentation on this and not sure what I am doing wrong. I am running ArcGIS 10.3.1 with Python 2.7.8 on a Windows 7 machine with 16gb ram.

Here is my code:

import arcpy
from arcpy.sa import *
from arcpy import env
import numpy as np
import sys, os, string, glob

#Set Environment
arcpy.CheckOutExtension("spatial")
arcpy.env.overwriteOutput = True
env.extent = "MINOF"
Coordsystem = "GEOGCS['GCS_WGS_1984',DATUM['D_WGS_1984',SPHEROID['WGS_1984',6378137.0,298.257223563]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]]"

#Set Input Data
tcorr = arcpy.Raster(r"G:/P38R37/test4/TcorrCon/Tcorr198.tif")
lowerLeft = arcpy.Point(tcorr.extent.XMin,tcorr.extent.YMin)
cellSize = tcorr.meanCellWidth

#Convert Raster to Numpy Array
#Raster is 7951 columns x 6971 rows, Floating point 32 bit, 211MB in size
tcorr_arr = arcpy.RasterToNumPyArray(tcorr,lowerLeft,7951,6971,nodata_to_value=0)

#Calculate the mean and standard deviation of the array
arr_mean = np.mean(tcorr_arr)
arr_std = np.std(tcorr_arr)

#Take the mean and std and do math
double_std = arr_std * 2
cfactor = mean - double_std

print cfactor

The error I am getting says:

Traceback (most recent call last):
File "C:\Python27\ArcGIS10.3\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 326, in RunScript
exec codeObject in __main__.__dict__
File "C:\Users\mschauer\Documents\Py Scripts\scratch\numpytest_tcorr.py", line 20, in <module>
tcorr_arr = arcpy.RasterToNumPyArray(tcorr,lowerLeft,7951,6971,nodata_to_value=0)
File "C:\ArcGIS\Desktop10.3\ArcPy\arcpy\__init__.py", line 2244, in RasterToNumPyArray
return _RasterToNumPyArray(*args, **kwargs)

MemoryError

Any ideas as to what I am doing wrong?

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
MattS
  • 143
  • 2
  • 8
  • 1
    Have you installed the 64bit background geoprocessing patch/installer? If not, you are using the 32bit version of Python and tools, and it won't matter how much ram your machine has, you'll be limited to 2 GB. – Evil Genius Oct 14 '15 at 13:40
  • Its not my machine and I do not have authorization to make modifications to my machine or install patches. And my IT department lives in the dark ages so explaining why I need to get a patch/installer for Python is too much for them.

    So I'm screwed then?

    – MattS Oct 14 '15 at 13:41
  • It's a patch for ArcGIS, from ESRI if that helps...your other option would be breaking the analysis into smaller parts if possible. – Evil Genius Oct 14 '15 at 13:45
  • Also, the answer to this question may be something to try, but if you are stuck with 32bit, you'll still be limited on memory. And I don't know if ArcGIS 10.3 comes with the SciPy module preinstalled... – Evil Genius Oct 14 '15 at 13:48
  • @EvilGenius, 10.3 does not come with scipy. While it's super convenient to just pip install scipy, you can use the module without IT or admin access. – Paul Oct 14 '15 at 15:39
  • @MattS, if you are limited to 32-bit GP, you can use the workflow described in example 2 of the help to chunk your input raster. – Paul Oct 14 '15 at 15:41
  • 1
    If you are limited in memory it is also a good idea to streamline your numpy calculations so they create less arrays which take up memory: print np.mean(tcorr_arr) - np.std(tcorr_arr)*2. – Kersten Oct 14 '15 at 15:49
  • Good news! I already had the 64bit background GP installed in my system and just got IT to change the command line path to point to it so I'm running Python in 64 bit now and I ran my script through it and it worked! Thanks Evil Genius! of course, now I have an entirely separate problem: the nodata_to_value that I set to zero. It is using those zeros to calculate the mean and std and I'm getting an abnormally low value. But, that is a topic for a separate thread perhaps. – MattS Oct 14 '15 at 16:12
  • 1
    @MattS just to warn you, you've got it working now but you may be shooting yourself in the foot in the future as I had issues with arcgis and the different versions of python (32 vs 64bit). So what I am saying is remember that you made this change as you'll probably need to undo it to get other tools working. – Hornbydd Oct 14 '15 at 16:46
  • 1
    Numpy can handle no_data values. If you want to ignore no_data values you need to convert them to np.nan and then use np.nanmean and np.nanstd. – Kersten Oct 15 '15 at 14:25

3 Answers3

4

One solution to avoid your 32-bit memory woes is to read the raster in chunks on to a memory-mapped array, then do your calculations. Something like this:

# Make a dict of data type references
dtyperef = {'U1': 'bool',
            'U2': 'int8',
            'U4': 'int8',
            'U8': 'uint8',
            'S8': 'int8',
            'U16': 'uint16',
            'S16': 'int16',
            'U32': 'uint32',
            'S32': 'int32',
            'F32': 'float32',
            'F64': 'float64'}
# Get some info
shape = (tcorr.height, tcorr.width)
dtype = dtyperef[tcorr.pixelType]
nodata = tcorr.noDataValue
cellsize_x = tcorr.meanCellWidth
cellsize_y = tcorr.meanCellHeight
topLeft = (tcorr.extent.XMin, tcorr.extent.YMax)

# Create output array using temporary file
import tempfile
with tempfile.TemporaryFile() as tf:
    tcorr_arr = np.memmap(tf, mode='w+', shape=shape, dtype=dtype)

    # Populate array using chunks
    ychunks = range(0, shape[0], 128)
    xchunks = range(0, shape[1], 128)
    for ych in zip(ychunks[:-1], ychunks[1:-1] + [shape[0]]):
        for xch in zip(xchunks[:-1], xchunks[1:-1] + [shape[1]]):
            xblock, yblock = xch[1] - xch[0], ych[1] - ych[0]
            blc = arcpy.Point(topleft[0] + (xch[0] * cellsize_x),
                              topLeft[1] - (ych[1] * cellsize_y))
            tcorr_arr[ych[0]:ych[1], xch[0]:xch[1]] = arcpy.RasterToNumPyArray(tcorr, blc, xblock, yblock)

    # Mask the no data value
    tcorr_arr = np.ma.masked_equal(tcorr_arr, nodata)

    # Do some mathy stuff
    cfactor = tcorr_arr.mean() - (tcorr_arr.std() * 2)

print cfactor
Devin Cairns
  • 246
  • 1
  • 6
1

Can you try just (works fine for me when I just want the math):

tcorr_arr = arcpy.RasterToNumPyArray(tcorr,nodata_to_value=0)

Also, not sure if this is causing the errors, but doesn't seems rigth to me:

says:

cfactor = mean - double_std

Shoud say:

cfactor = arr_mean - double_std
Delonix R.
  • 795
  • 6
  • 15
0

Just a workaround if you don't want to store a numpy array in the memory but you just need the mean and the std of a raster

import arcpy

your_raster = arcpy.Raster(r"G:/P38R37/test4/TcorrCon/Tcorr198.tif")
your_mean = your_raster.mean
your_STD = your_raster.standardDeviation

then do your math....

Note that "calculate statistics" could be needed before you ask for the mean and STD, with parameters to exclude some values if needed.

radouxju
  • 49,636
  • 2
  • 71
  • 144