Skip to content

Commit

Permalink
Merge pull request #1562 from romainx/fix-1423
Browse files Browse the repository at this point in the history
Turn off ipython low-level output capture and forward
  • Loading branch information
mathbunnyru authored Jan 9, 2022
2 parents f6f6a65 + c81a0ea commit 43882c4
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 0 deletions.
21 changes: 21 additions & 0 deletions docs/using/specifics.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,27 @@ This page provides details about features specific to one or more images.
Note every new spark context that is created is put onto an incrementing port (ie. 4040, 4041, 4042, etc.), and it might be necessary to open multiple ports.
For example: `docker run -d -p 8888:8888 -p 4040:4040 -p 4041:4041 jupyter/pyspark-notebook`.

#### IPython low-level output capture and forward

Spark images (`pyspark-notebook` and `all-spark-notebook`) have been configured to disable IPython low-level output capture and forward system-wide.
The rationale behind this choice is that Spark logs can be verbose, especially at startup when Ivy is used to load additional jars.
Those logs are still available but only in the container's logs.

If you want to make them appear in the notebook, you can overwrite the configuration in a user level IPython kernel profile.
To do that you have to uncomment the following line in your `~/.ipython/profile_default/ipython_kernel_config.py` and restart the kernel.

```Python
c.IPKernelApp.capture_fd_output = True
```

If you have no IPython profile you can initiate a fresh one by running the following command.

```bash
ipython profile create
# [ProfileCreate] Generating default config file: '/home/jovyan/.ipython/profile_default/ipython_config.py'
# [ProfileCreate] Generating default config file: '/home/jovyan/.ipython/profile_default/ipython_kernel_config.py'
```

### Build an Image with a Different Version of Spark

You can build a `pyspark-notebook` image (and also the downstream `all-spark-notebook` image) with a different version of Spark by overriding the default value of the following arguments at build time.
Expand Down
4 changes: 4 additions & 0 deletions pyspark-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ RUN cp -p "${SPARK_HOME}/conf/spark-defaults.conf.template" "${SPARK_HOME}/conf/
echo 'spark.driver.extraJavaOptions -Dio.netty.tryReflectionSetAccessible=true' >> "${SPARK_HOME}/conf/spark-defaults.conf" && \
echo 'spark.executor.extraJavaOptions -Dio.netty.tryReflectionSetAccessible=true' >> "${SPARK_HOME}/conf/spark-defaults.conf"

# Configure IPython system-wide
COPY ipython_kernel_config.py "/etc/ipython/"
RUN fix-permissions "/etc/ipython/"

USER ${NB_UID}

# Install pyarrow
Expand Down
13 changes: 13 additions & 0 deletions pyspark-notebook/ipython_kernel_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Configuration file for ipython-kernel.
# See <https://ipython.readthedocs.io/en/stable/config/options/kernel.html>

# With IPython >= 6.0.0, all outputs to stdout/stderr are captured.
# It is the case for subprocesses and output of compiled libraries like Spark.
# Those logs now both head to notebook logs and in notebooks outputs.
# Logs are particularly verbose with Spark, that is why we turn them off through this flag.
# <https://github.com/jupyter/docker-stacks/issues/1423>

# Attempt to capture and forward low-level output, e.g. produced by Extension
# libraries.
# Default: True
c.IPKernelApp.capture_fd_output = False # noqa: F821

0 comments on commit 43882c4

Please sign in to comment.