22

Using PySpark in a Jupyter notebook, the output of Spark's DataFrame.show is low-tech compared to how Pandas DataFrames are displayed. I thought "Well, it does the job", until I got this:

enter image description here

The output is not adjusted to the width of the notebook, so that the lines wrap in an ugly way. Is there a way to customize this? Even better, is there a way to get output Pandas-style (without converting to pandas.DataFrame obviously)?

clstaudt
  • 19,377
  • 39
  • 142
  • 226
  • you could just convert the first 5 rows to pandas df – mtoto May 25 '18 at 07:57
  • 3
    `df.limit(5).toPandas()` – phi May 25 '18 at 08:06
  • 3
    Two workarounds: Maybe you could try to expand your Jupyter Notebook cell like the accepted answer at https://stackoverflow.com/questions/21971449/how-do-i-increase-the-cell-width-of-the-jupyter-ipython-notebook-in-my-browser or to use `df.show(vertical=True)` as you can see in the example at `def show(self, n=20, truncate=True, vertical=False)` in the source code https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py – titiro89 May 25 '18 at 13:03

5 Answers5

19

This is now possible natively as of Spark 2.4.0 by setting spark.sql.repl.eagerEval.enabled to True:

enter image description here

Kyle Barron
  • 2,084
  • 21
  • 16
  • 3
    This does not appear to work for me on my own dataset which has a lot of columns. `spark.conf.set("spark.sql.repl.eagerEval.enabled",True)` followed by `df.limit(10)` – Reddspark Apr 02 '19 at 22:22
  • This would be good if it worked, which it does not on `2.4.3`, apparently. – ijoseph May 06 '20 at 22:53
  • 1
    This will load the entire dataset into your driver which may not be desired. – Luis Meraz Aug 18 '20 at 19:33
18

After playing around with my table which has a lot of columns I decided the best thing to do to get a feel for the data is to use:

df.show(n=5, truncate=False, vertical=True)

This displays it vertically without truncation and is the cleanest viewing I can come up with.

Reddspark
  • 5,514
  • 7
  • 38
  • 53
7

You can use an html magic command. Check the CSS selector is correct by inspecting the output cell. Then edit below accordingly and run it in a cell.

%%html
<style>
div.output_area pre {
    white-space: pre;
}
</style>
Luis Meraz
  • 2,206
  • 1
  • 11
  • 7
1

Adding to the answers given above by @karan-singla and @vijay-jangir given in pyspark show dataframe as table with horizontal scroll in ipython notebook, a handy one-liner to comment out the white-space: pre-wrap styling can be done like so:

$ awk -i inplace '/pre-wrap/ {$0="/*"$0"*/"}1' $(dirname `python -c "import notebook as nb;print(nb.__file__)"`)/static/style/style.min.css

This translates as; use awk to update inplace lines that contain pre-wrap to be surrounded by */ -- */ i.e. comment out, on the file found in styles.css found in your working Python environment.

This, in theory, can then be used as an alias if one uses multiple environments, say with Anaconda.

REFs:

tallamjr
  • 1,082
  • 14
  • 19
0

Just try this:

df.show(truncate=False)
Talha Tayyab
  • 2,102
  • 9
  • 14
  • 25
  • Isn't this effectively the same answer as [the top-voted answer from three years ago](https://stackoverflow.com/a/55484426/3025856), but with less explanation? The previous answer uses a couple of additional arguments, but the core guidance is the same. – Jeremy Caney Oct 11 '21 at 19:43