How to find size (in MB) of dataframe in pyspark ,
df=spark.read.json("/Filestore/tables/test.json") I want to find how the size of df or test.json
How to find size (in MB) of dataframe in pyspark ,
df=spark.read.json("/Filestore/tables/test.json") I want to find how the size of df or test.json
In general this is not easy. You can
org.apache.spark.util.SizeEstimatordf.inputfiles() and use an other API to get the file size directly (I did so using Hadoop Filesystem API (How to get file size). Not that only works if the dataframe was not fitered/aggregated