1

I know that I should use Spark Datasets primarily, however I am wondering if there are good situations where I should use RDDs instead of Datasets?

SCouto
  • 7,371
  • 4
  • 33
  • 47
jk1
  • 513
  • 6
  • 15

1 Answers1

2

In a common Spark application you should go for the Dataset/Dataframe. Spark internally optimize those structure and they provide you high level APIs to manipulate the data. However there are situation when RDD are handy:

  • When manipulating graphs using GraphX
  • When integration with 3rd party libraries that only know how to handle RDD
  • When you want to use low level API to have a better control over your workflow (e.g reduceByKey, aggregateByKey)
dumitru
  • 1,970
  • 12
  • 22