1

In SO 33655920 I come across the below, fine.

rdd = sc.parallelize([1, 2, 3, 4], 2)
def f(iterator): yield sum(iterator)
rdd.mapPartitions(f).collect()

In Scala, I cannot seem to get the the def in the same shorthand way. The equivalent is? I have searched and tried but to no avail.

Thanks in advance.

Seth Tisue
  • 28,814
  • 11
  • 81
  • 148
thebluephantom
  • 14,410
  • 8
  • 36
  • 67

2 Answers2

2

yield sum(iterator) in Python sums the elements of the iterator. The similar way of doing this in Scala would be:

val rdd = sc.parallelize(Array(1, 2, 3, 4), 2)
rdd.mapPartitions(it => Iterator(it.sum)).collect()
Jiri Kremser
  • 11,859
  • 5
  • 42
  • 70
1

If you want to sum values in the partition you can write something like

val rdd = sc.parallelize(1 to 4, 2)
def f(i: Iterator[Int]) = Iterator(i.sum)
rdd.mapPartitions(f).collect()
addmeaning
  • 1,288
  • 1
  • 12
  • 35