Improve PySpark DataFrame.show output to fit Jupyter notebook

Question

Using PySpark in a Jupyter notebook, the output of Spark's DataFrame.show is low-tech compared to how Pandas DataFrames are displayed. I thought "Well, it does the job", until I got this:

The output is not adjusted to the width of the notebook, so that the lines wrap in an ugly way. Is there a way to customize this? Even better, is there a way to get output Pandas-style (without converting to pandas.DataFrame obviously)?

Two workarounds: Maybe you could try to expand your Jupyter Notebook cell like the accepted answer at https://stackoverflow.com/questions/21971449/how-do-i-increase-the-cell-width-of-the-jupyter-ipython-notebook-in-my-browser or to use `df.show(vertical=True)` as you can see in the example at `def show(self, n=20, truncate=True, vertical=False)` in the source code https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py — titiro89, May 25 '18 at 13:03

score 19 · Answer 1 · answered Nov 30 '18 at 21:42

19

This is now possible natively as of Spark 2.4.0 by setting spark.sql.repl.eagerEval.enabled to True:

answered Nov 30 '18 at 21:42

Kyle Barron

2,084
21
16

3

This does not appear to work for me on my own dataset which has a lot of columns. `spark.conf.set("spark.sql.repl.eagerEval.enabled",True)` followed by `df.limit(10)` – Reddspark Apr 02 '19 at 22:22
This would be good if it worked, which it does not on `2.4.3`, apparently. – ijoseph May 06 '20 at 22:53
1

This will load the entire dataset into your driver which may not be desired. – Luis Meraz Aug 18 '20 at 19:33

score 18 · Answer 2 · answered Apr 02 '19 at 22:30

18

After playing around with my table which has a lot of columns I decided the best thing to do to get a feel for the data is to use:

df.show(n=5, truncate=False, vertical=True)

This displays it vertically without truncation and is the cleanest viewing I can come up with.

answered Apr 02 '19 at 22:30

Reddspark

5,514
7
38
53

3

The output from your code is kind of better than the horizontal view for me because it doesn't hide any columns. – hui chen Jan 16 '20 at 14:18
1

This is perfect. – ijoseph May 06 '20 at 22:53

score 7 · Answer 3 · answered Aug 18 '20 at 20:39

7

You can use an html magic command. Check the CSS selector is correct by inspecting the output cell. Then edit below accordingly and run it in a cell.

%%html
<style>
div.output_area pre {
    white-space: pre;
}
</style>

answered Aug 18 '20 at 20:39

Luis Meraz

2,206
1
11
7

genius! --------- – Aliostad Feb 16 '21 at 20:49

score 1 · Answer 4 · answered May 18 '20 at 10:41

Adding to the answers given above by @karan-singla and @vijay-jangir given in pyspark show dataframe as table with horizontal scroll in ipython notebook, a handy one-liner to comment out the white-space: pre-wrap styling can be done like so:

$ awk -i inplace '/pre-wrap/ {$0="/*"$0"*/"}1' $(dirname `python -c "import notebook as nb;print(nb.__file__)"`)/static/style/style.min.css

This translates as; use awk to update inplace lines that contain pre-wrap to be surrounded by */ -- */ i.e. comment out, on the file found in styles.css found in your working Python environment.

This, in theory, can then be used as an alias if one uses multiple environments, say with Anaconda.

REFs:

score 0 · Answer 5 · answered Oct 11 '21 at 16:01

0

Just try this:

df.show(truncate=False)

answered Oct 11 '21 at 16:01

Talha Tayyab

2,102
9
14
25

Isn't this effectively the same answer as [the top-voted answer from three years ago](https://stackoverflow.com/a/55484426/3025856), but with less explanation? The previous answer uses a couple of additional arguments, but the core guidance is the same. – Jeremy Caney Oct 11 '21 at 19:43

Improve PySpark DataFrame.show output to fit Jupyter notebook

5 Answers5

Linked

Related