It seems that I can memmap the underlying data for a python series by creating a mmap'd ndarray and using it to initialize the Series.
def assert_readonly(iloc):
try:
iloc[0] = 999 # Should be non-editable
raise Exception("MUST BE READ ONLY (1)")
except ValueError as e:
assert "read-only" in e.message
# Original ndarray
n = 1000
_arr = np.arange(0,1000, dtype=float)
# Convert it to a memmap
mm = np.memmap(filename, mode='w+', shape=_arr.shape, dtype=_arr.dtype)
mm[:] = _arr[:]
del _arr
mm.flush()
mm.flags['WRITEABLE'] = False # Make immutable!
# Wrap as a series
s = pd.Series(mm, name="a")
assert_readonly(s.iloc)
Success! Its seems that s is backed by a read-only mem-mapped ndarray.
Can I do the same for a DataFrame? The following fails
df = pd.DataFrame(s, copy=False, columns=['a'])
assert_readonly(df["a"]) # Fails
The following succeeds, but only for one column:
df = pd.DataFrame(mm.reshape(len(mm,1)), columns=['a'], copy=False)
assert_readonly(df["a"]) # Succeeds
... so I can make a DF without copying. However, this only works for one column, and I want many. Method I've found for combining 1-column DFs: pd.concat(..copy=False), pd.merge(copy=False), ... result in copies.
I have some thousands of large columns as datafiles, of which I only ever need a few at a time. I was hoping I'd be able to place their mmap'd representations in a DataFrame as above. Is it possible?
Pandas documentation makes it a little difficult to guess about what's going on under the hood here - although it does say a DataFrame "Can be thought of as a dict-like container for Series objects.". I'm beginning to this this is no longer the case.
I'd prefer not to need HD5 to solve this.