2

I'm able to read a parquet file located on GCS thanks to this answer (read the first answer). I used the pd.read_parquet function, with pyarrow engine. I'd like now to access the parquet metadata without download the data into the dataframe. Is it possible to do that with pandas?

alcor
  • 409
  • 5
  • 15

1 Answers1

3

I found a solution, using gcsfs without Pandas:

import pyarrow.parquet as pq
import gcsfs

fs = gcsfs.GCSFileSystem(project=myprojectname)

f = fs.open(myfilepath)
myschema = pq.ParquetFile(f).schema

print(myschema)
jamiet
  • 8,334
  • 9
  • 57
  • 120
alcor
  • 409
  • 5
  • 15