Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn off ipython low-level output capture and forward #1562

Merged
merged 3 commits into from
Jan 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions docs/using/specifics.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,27 @@ This page provides details about features specific to one or more images.
Note every new spark context that is created is put onto an incrementing port (ie. 4040, 4041, 4042, etc.), and it might be necessary to open multiple ports.
For example: `docker run -d -p 8888:8888 -p 4040:4040 -p 4041:4041 jupyter/pyspark-notebook`.

#### IPython low-level output capture and forward

Spark images (`pyspark-notebook` and `all-spark-notebook`) have been configured to disable IPython low-level output capture and forward system-wide.
The rationale behind this choice is that Spark logs can be verbose, especially at startup when Ivy is used to load additional jars.
Those logs are still available but only in the container's logs.

If you want to make them appear in the notebook, you can overwrite the configuration in a user level IPython kernel profile.
To do that you have to uncomment the following line in your `~/.ipython/profile_default/ipython_kernel_config.py` and restart the kernel.

```Python
c.IPKernelApp.capture_fd_output = True
```

If you have no IPython profile you can initiate a fresh one by running the following command.

```bash
ipython profile create
# [ProfileCreate] Generating default config file: '/home/jovyan/.ipython/profile_default/ipython_config.py'
# [ProfileCreate] Generating default config file: '/home/jovyan/.ipython/profile_default/ipython_kernel_config.py'
```

### Build an Image with a Different Version of Spark

You can build a `pyspark-notebook` image (and also the downstream `all-spark-notebook` image) with a different version of Spark by overriding the default value of the following arguments at build time.
Expand Down
4 changes: 4 additions & 0 deletions pyspark-notebook/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ RUN cp -p "${SPARK_HOME}/conf/spark-defaults.conf.template" "${SPARK_HOME}/conf/
echo 'spark.driver.extraJavaOptions -Dio.netty.tryReflectionSetAccessible=true' >> "${SPARK_HOME}/conf/spark-defaults.conf" && \
echo 'spark.executor.extraJavaOptions -Dio.netty.tryReflectionSetAccessible=true' >> "${SPARK_HOME}/conf/spark-defaults.conf"

# Configure IPython system-wide
COPY ipython_kernel_config.py "/etc/ipython/"
RUN fix-permissions "/etc/ipython/"

USER ${NB_UID}

# Install pyarrow
Expand Down
13 changes: 13 additions & 0 deletions pyspark-notebook/ipython_kernel_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Configuration file for ipython-kernel.
# See <https://ipython.readthedocs.io/en/stable/config/options/kernel.html>

# With IPython >= 6.0.0, all outputs to stdout/stderr are captured.
# It is the case for subprocesses and output of compiled libraries like Spark.
# Those logs now both head to notebook logs and in notebooks outputs.
# Logs are particularly verbose with Spark, that is why we turn them off through this flag.
# <https://github.com/jupyter/docker-stacks/issues/1423>

# Attempt to capture and forward low-level output, e.g. produced by Extension
# libraries.
# Default: True
c.IPKernelApp.capture_fd_output = False # noqa: F821