0

Me using spark-sql for data migration project. So how should I implement stage area in spark ? when to use spark sql cache or persists? any real time use cases ?

~Sha

BdEngineer
  • 2,475
  • 3
  • 38
  • 73

1 Answers1

-1

Similarly to RDD (What is the difference between cache and persist?) the only difference between cache and persist is ability to set non-default storage mode.

There is one important difference though. Unlike in RDD API, where cache uses MEMORY_ONLY, Dataset counterpart uses MEMORY_AND_DISK.