I am trying to use joblib.Memory to cache the load of a DXF file (a 3D CAD file format) and am finding that loading the cached object seems to end in a recursion error which exhausts the python stack. I've created a MWE that demonstrates the problem.
import numpy as np
from ezdxf import recover
import numpy as np
from joblib import Memory
from numpy.lib.function_base import extract
location = './cachedir'
memory = Memory(location, verbose=0)
def _load_dxf_file( filename ):
print( 'Loading DXF file %s' % filename )
doc, auditor = recover.readfile(filename)
print( ' DXF file load complete.' )
return doc
if __name__ == '__main__':
load_dxf_file = memory.cache( _load_dxf_file )
filename = 'cube_mesh_2.dxf'
doc = load_dxf_file( filename )
The sample DXF file (a simple cube) is available at this gist (too large to paste here).
https://gist.github.com/jrjbertram/87e31b3bb0ce2d3771dce7f50d2d0fba
The object loads the first time without any issues (and the data is fine, plotting not included in the MWE.) Re-running the script so that the cached files are used results in errors like:
1 WARNING:root:[MemorizedFunc(func=<function _load_dxf_file at 0x7fbaee2a38b0>, location=./cachedir/joblib)]: Exception while loading results for _load_dxf_file('cube_mesh_2.dxf')
2 Traceback (most recent call last):
3 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/joblib/memory.py", line 513, in _cached_call
4 out = self.store_backend.load_item(
5 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/joblib/_store_backends.py", line 170, in load_item
6 item = numpy_pickle.load(f)
7 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 575, in load
8 obj = _unpickle(fobj)
9 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
10 obj = unpickler.load()
11 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/pickle.py", line 1210, in load
12 dispatch[key[0]](self)
13 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/joblib/numpy_pickle.py", line 329, in load_build
14 Unpickler.load_build(self)
15 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/pickle.py", line 1701, in load_build
16 setstate = getattr(inst, "__setstate__", None)
17 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/ezdxf/entities/dxfns.py", line 126, in __getattr__
18 attrib_def: Optional[DXFAttr] = self.dxfattribs.get(key)
19 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/ezdxf/entities/dxfns.py", line 300, in dxfattribs
20 return self._entity.DXFATTRIBS
21 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/ezdxf/entities/dxfns.py", line 126, in __getattr__
22 attrib_def: Optional[DXFAttr] = self.dxfattribs.get(key)
23 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/ezdxf/entities/dxfns.py", line 300, in dxfattribs
24 return self._entity.DXFATTRIBS
25 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/ezdxf/entities/dxfns.py", line 126, in __getattr__
26 attrib_def: Optional[DXFAttr] = self.dxfattribs.get(key)
27 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/ezdxf/entities/dxfns.py", line 300, in dxfattribs
28 return self._entity.DXFATTRIBS
29 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/ezdxf/entities/dxfns.py", line 126, in __getattr__
30 attrib_def: Optional[DXFAttr] = self.dxfattribs.get(key)
31 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/ezdxf/entities/dxfns.py", line 300, in dxfattribs
32 return self._entity.DXFATTRIBS
<snip>
1993 File "/Users/bertrjr1/opt/anaconda3/lib/python3.8/site-packages/ezdxf/entities/dxfns.py", line 126, in __getattr__
1994 attrib_def: Optional[DXFAttr] = self.dxfattribs.get(key)
1995 RecursionError: maximum recursion depth exceeded
I suspect that there must be some rules about what joblib can successfully cache and restore and the ezdxf library must violate them? Or perhaps this is a pickle limitation similar to what is described here:
Cannot pickle object: maximum recursion depth exceeded
My workaround right now is to load the DXF file (the real one takes a long time and is very large), perform face / vertex / triangulation processing saving results off to numpy arrays, then using joblib.Memory to cache the numpy arrays. However, in my case, I need to rerun this triangulation and other processing often on the same DXF file, so it would be nice to be able to cache the loaded DXF file itself so I can avoid the parsing penalty (as the DXF file format is a very detailed text file that must all be processed.)
I may just need to walk the loaded DXF file and return a pruned version of it in some other format (lists, dictionary, etc?) and then cache that instead.
Any suggestions or ideas welcome.
Thank you, Josh.