0

I am tackling a problem similar to this - How to read the multi nested JSON data in Spark

The main reason I am posting is because I do not understand the solution's statement: "Consecutive use of withColumn is not recommended for huge dataset as it might give random output. The reason is that withColumn is distributed and order of execution is not proved to be followed in serial manner"

What exactly does this entail? I naturally like using withColumn more since with select I have to keep repeating the same columns I want to keep while I explode nested arrays.

sandbar
  • 90
  • 6

0 Answers0