1

I am trying to understand how foreach method works. In my jupyter notebook, I tried :

def f(x): print(x)
a = sc.parallelize([1, 2, 3, 4, 5])
b = a.foreach(f)
print(type(b))
<class 'NoneType'>

I can execute that without any problem, but I don't have any output except the print(type(b)) part. The foreach doesn't return anything, just a none type. I do not know what foreach is supposed to do, and how to use it. Can you explain me what it is used for ?

desertnaut
  • 52,940
  • 19
  • 125
  • 157
Steven
  • 11,973
  • 4
  • 33
  • 66

1 Answers1

3

foreach is an action, and does not return anything; so, you cannot use it as you do, i.e. assigning it to another variable like b = a.foreach(f). From Learning Spark, p. 41-42:

enter image description here

enter image description here

Adapting the simple example from the docs, run in a PySpark terminal:

>>> def f(x): print(x)
>>> a = sc.parallelize([1, 2, 3, 4, 5])
>>> a.foreach(f)
5
4
3
1
2

(NOTE: not sure about Jupyter, but the above code will not produce any print results in a Databricks notebook.)

You may also find the answers in this thread helpful.

desertnaut
  • 52,940
  • 19
  • 125
  • 157