0

I have a set of notebooks for teaching Spark's structured streaming API. I don't mind the fact that it shows so many logging messages at the Jupyter Lab output cell since I can always configure log4j, but when I use the "console" sink, the output shows up in the notebook and Jupyter doesn't auto scroll to track the tail-end of the output, making it annoying to work with, especially for fast streams.

Related solutions I have found:

  • There is an auto scroll extension for Jupyter Lab but it does not work with my version (v3.4.2) of Jupyter Lab. This would be nice if it worked since what I am really after is tracking the tail-end of the console sink output.
  • There is a compromise solution on stackoverflow where you can use IPython's display functions to continuously query the in-memory result table and manually refresh the output cell. The problem with this is that there is no easy way to show the latest N rows of the result without adding extra id columns for ordering. By default, only the first N=20 rows are shown.

My question: Is it possible to configure the console sink so that the output goes to the controlling terminal where I ran the jupyter lab command from?

My software versions (jupyterlab, pyspark via pip):

  • Python 3.10.4
  • jupyterlab 3.4.2
  • pyspark 3.2.1
Jobayer
  • 3
  • 2

0 Answers0