0

I have 6 really large parquet files.

df0 = pd.read_parquet('df0.parquet')
df1 = pd.read_parquet('df1.parquet')
df2 = pd.read_parquet('df2.parquet')
df3 = pd.read_parquet('df3.parquet')
df4 = pd.read_parquet('df4.parquet')
df5 = pd.read_parquet('df5.parquet')
df6 = pd.read_parquet('df6.parquet')

I am trying to concat them. So I did:

m_df = pd.concat([df0,df1,df2,df3,df4,df5,df6],ignore_index=True)
m_df

I try to export them into one large parquet

but when I did m_df.to_parquet('m_df.parquet', index = False), the kernel died and couldn't go through. I think the data itself is large. Since we can't upgrade the instance. Is there any way to do this more efficiently?

0 Answers0