What does spark.sql.shuffle.partitions exactly refer to?

Question

What exactly does spark.sql.shuffle.partitions refer to? Are we talking of the number of partitions that is the results of a wide transformation, or something that happens in the middle as in some sort of intermediary partitioning before the result partition of the wide transformation?

Because in my understanding, as per a wide transformation we have

Parents RDDs -> shuffle files -> Child RDDs

What does the spark.sql.shuffle.partitions parameter refer to here? The shuffles files or the CHILD RDDs or something else that I ignored?

score 1 · Accepted Answer · edited Feb 07 '21 at 19:59

1

This is already explained in the official docs:

spark.sql.shuffle.partitions 200 Configures the number of partitions to use when shuffling data for joins or aggregations.

In other words it is the number of partitions of the child Dataset.

edited Feb 07 '21 at 19:59

vinsce

697
6
15

answered Sep 24 '18 at 09:27

user10407081

74
1

I was not sure about making the “in other words” myself, I just wanted a confirmation” thank you – MaatDeamon Sep 24 '18 at 09:56

What does spark.sql.shuffle.partitions exactly refer to?

1 Answers1

Linked