0

Apache Spark's Dataset.map(func) method deserializes each element, runs func and serializes the result. In a case where func doesn't want to look at the element and just returns it unchanged, the deserialization / serialization is unnecessary. Is there a way to avoid it?

(The specific case I have in mind is counting the elements as they are written out by using an accumulator.)

Daniel Darabos
  • 26,386
  • 9
  • 100
  • 112

0 Answers0