I have 6 really large parquet files.
df0 = pd.read_parquet('df0.parquet')
df1 = pd.read_parquet('df1.parquet')
df2 = pd.read_parquet('df2.parquet')
df3 = pd.read_parquet('df3.parquet')
df4 = pd.read_parquet('df4.parquet')
df5 = pd.read_parquet('df5.parquet')
df6 = pd.read_parquet('df6.parquet')
I am trying to concat them. So I did:
m_df = pd.concat([df0,df1,df2,df3,df4,df5,df6],ignore_index=True)
m_df
I try to export them into one large parquet
but when I did m_df.to_parquet('m_df.parquet', index = False), the kernel died and couldn't go through. I think the data itself is large. Since we can't upgrade the instance. Is there any way to do this more efficiently?