0

I am reading a parquet file and transforming it into dataframe.

from fastparquet import ParquetFile 
pf = ParquetFile('file.parquet') 
df = pf.to_pandas() 

Is there a way to read a parquet file from a variable (that previously read and now hold parquet data)?

Thanks.

Joe
  • 10,493
  • 23
  • 85
  • 159

2 Answers2

0

In Pandas there is method to deal with parquet. Here is reference to the docs. Something like that:

import pandas as pd 
pd.read_parquet('file.parquet') 

should work. Also please read this post for engine selection.

Michał Zaborowski
  • 3,843
  • 2
  • 19
  • 37
  • Yes. Can you, please elaborate more about what you are trying to do? – Michał Zaborowski Mar 08 '19 at 09:45
  • Some process A reads a parquet file and have it in a Variable. Process B reads the Variable (parquet file variable). Just need to read parquet from Variable (not file). – Joe Mar 08 '19 at 13:25
0

You can read a file from a variable also using pandas.read_parquet using the following code. I tested this with the pyarrow backend but this should also work for the fastparquet backend.

import pandas as pd
import io

with open("file.parquet", "rb") as f:
    data = f.read()

buf = io.BytesIO(data)
df = pd.read_parquet(buf)
Uwe L. Korn
  • 6,680
  • 1
  • 27
  • 41