is there any more efficient way other than pd.concat

Asked Aug 12 '21 at 02:59

Active Aug 12 '21 at 02:59

Viewed 31 times

I have 6 really large parquet files.

df0 = pd.read_parquet('df0.parquet')
df1 = pd.read_parquet('df1.parquet')
df2 = pd.read_parquet('df2.parquet')
df3 = pd.read_parquet('df3.parquet')
df4 = pd.read_parquet('df4.parquet')
df5 = pd.read_parquet('df5.parquet')
df6 = pd.read_parquet('df6.parquet')

I am trying to concat them. So I did:

m_df = pd.concat([df0,df1,df2,df3,df4,df5,df6],ignore_index=True)
m_df

I try to export them into one large parquet

but when I did m_df.to_parquet('m_df.parquet', index = False), the kernel died and couldn't go through. I think the data itself is large. Since we can't upgrade the instance. Is there any way to do this more efficiently?

asked Aug 12 '21 at 02:59

LearningCode

Does [this post](https://stackoverflow.com/a/44910807/7375347) answer your question? – tax evader Aug 12 '21 at 03:04

is there any more efficient way other than pd.concat

0 Answers0