My development environment is a single-user workstation with 4 cores but not running Spark or HDFS. I have a CSV file that's too big to fit in memory. I want to save it as a parquet file and analyze it locally using existing tools, but have the ability to move it to the Spark cluster in the future and analyze it with Spark.
Is there any way to do this row-by-row without moving the file over to the Spark cluster?
I'm looking for a pure-python solution that does not involve the use of Spark.