1

I am able to save the RDD output to HDFS with saveAsTextFile method. This method throws an exception if the file path already exists.

I have a use case where I need to save the RDDS in an already existing file path in HDFS. Is there a way to do just append the new RDD data to the data that is already existing in the same path?

yAsH
  • 3,305
  • 8
  • 34
  • 65

1 Answers1

7

One possible solution, available since Spark 1.6, is to use DataFrames with text format and append mode:

val outputPath: String = ???

rdd.map(_.toString).toDF.write.mode("append").text(outputPath)
zero323
  • 305,283
  • 89
  • 921
  • 912