Format to cleanly save and restore DataFrame?

Question

I want to save pandas table in a file, so I can read it from that file later. My requirements:

the file format should be decently portable (good library support on Windows/Linux in major languages)
the DataFrame I read should be absolutely identical to the one I saved

According to this post, read_csv and to_csv may work if I provide index_col=0 argument, but the datatypes are lost (and of course, automatic type inference doesn't guarantee to give me the same types even for simple types, not to mention if I use python objects like lists which are never inferred).

Is there some simple solution that just works for sure, without having to worry about many edge cases?

The only solution I can think of, is using to_csv / read_csv, but save type information somewhere else. Still, I'm afraid there might more hidden problems (like duplicate column names, etc.).

@tzaman I guess it's related, but that question is focused on speed, and the top/accepted answer is completely inappropriate in my case since I'm looking for portability. (pickle files can't be read outside of python, not easily). — max, Aug 12 '16 at 22:09
That same answer also mentions `hdf5`. Does that not satisfy? — piRSquared, Aug 12 '16 at 22:21
@piRSquared Yup just checked and it works. (Apart from same-name columns which are not allowed, but it's ok.) I didn't see any guarantee in the docs that HDF5 read/write are invertible, but I guess it just happens to be.. — max, Aug 12 '16 at 23:14
I use it regularly. It's very fast and portable. Only thing I can't verify is strong support from other languages. But I do see on wikipedia that it is supported widely. — piRSquared, Aug 12 '16 at 23:15

score -1 · Answer 1 · answered Aug 12 '16 at 22:14

-1

pd.DataFrame.to_pickle / pd.read_pickle hold columns data types. Let's check it out:

df_in.to_pickle('input_5')
df_out = pd.read_pickle('/input_5')

answered Aug 12 '16 at 22:14

ragesz

7,852
18
68
86

Pickle is a bad choice : it is poorly portable. It evolves between versions of pandas / python. – Raphael Jolivet May 04 '22 at 13:01

Format to cleanly save and restore DataFrame?

1 Answers1