0

In Pyspark, when I try to print a list, I get all the elements printed in the same line :

>>> wordslist = words.collect();
>>> wordslist
[(u'crazy', 1), (u'fox', 1), (u'jumped', 1)]

Is there any way I could get the output printed one item per line, like this :

>>> wordslist
[
(u'crazy', 1),
(u'fox', 1),
(u'jumped', 1)
]
Bajji
  • 2,007
  • 2
  • 19
  • 34

2 Answers2

1

This is basic python. When you collect a result from an RDD. You obtain a list which you can iterate on and print each element in the format you wish.

I think that the question on how to print a list had been answered so many times in SO.

And here is one example.

$> mylist = myrdd.collect()
$> for elem in mylist:
$>     print elem 

You'd also want to check pyspark documentation .

Community
  • 1
  • 1
eliasah
  • 38,149
  • 10
  • 118
  • 148
1

Same can be achieved using foreach in Scala

mylist.foreach(println)
Andrew Kane
  • 3,039
  • 17
  • 39
satya
  • 46
  • 3