How to get the input from local and store it to local in a dataframe

Question

    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.functions.{min, max, avg}
    val spark = SparkSession.builder().getOrCreate()
    
    val data = spark.read.option("header","true").format("csv").load("cpid_data.csv")
    
   val equi = data.withColumn("value",col("value").cast("double")).groupBy("id").agg(avg("value"), max("value"), min("value")).show()

In the above code when I try to write the output in a csv file like,

equi.write.option("header",true).csv("cpido.csv")

It is throwing error like write is not a member of unit.

could anyone help me with this? How to load the output to a csv file?

score 0 · Accepted Answer · answered Dec 04 '21 at 15:02

0

The error message is giving you a strong indication: write is not a member of Unit means there's no method called write on the type Unit.

Said differently: equi is of type Unit, which is probably not what you desired.

Just remove the call to .show() and it'll work fine:

val equi = data
  .withColumn("value",col("value").cast("double"))
  .groupBy("id")
  .agg(avg("value"), max("value"), min("value"))

equi.write.option("header",true).csv("cpido.csv")

answered Dec 04 '21 at 15:02

Gaël J

7,665
4
16
29

That works but the output is displayed as individual csv file for each row in the output. why is that? – Vikram Dec 04 '21 at 15:09
@Vikram see https://stackoverflow.com/questions/31674530/write-single-csv-file-using-spark-csv – Gaël J Dec 04 '21 at 15:33
repartition and coalesce is not working in my case. Do we need to import anything in terms of scala code? – Vikram Dec 04 '21 at 15:54

How to get the input from local and store it to local in a dataframe

1 Answers1