Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected console message from IVY and spark #1423

Closed
mbalduini opened this issue Aug 9, 2021 · 11 comments · Fixed by #1562
Closed

Unexpected console message from IVY and spark #1423

mbalduini opened this issue Aug 9, 2021 · 11 comments · Fixed by #1562
Labels
tag:Upstream A problem with one of the upstream packages installed in the docker images type:Bug A problem with the definition of one of the docker images maintained here

Comments

@mbalduini
Copy link

Description

  • used docker image: jupyter/pyspark-notebook:b9f6ce795cfc
  • Started up via docker-compose:
version: '3'
services:
  qc-platform:
    image: jupyter/pyspark-notebook:b9f6ce795cfc
    ports:
      - 8888:8888
    environment:
      - GRANT_SUDO=yes 
      - JUPYTER_ENABLE_LAB=yes
      - JUPYTER_TOKEN=test
    user: root
    restart: unless-stopped
  • Additional Information: Downgraded from java 11 to Java 8

Running a simple cell to create a spark session (see code below) annoying message from Ivy (related to packages config) and spark start to be printed on console.

Screenshot 2021-08-09 at 13 03 56

I tried to change che log4j configuration for spark and the logging lib to set the global logging level with no results.

Any help in removing completely the messages in the picture?

@mbalduini mbalduini added the type:Bug A problem with the definition of one of the docker images maintained here label Aug 9, 2021
@mathbunnyru
Copy link
Member

I think the issue is that now, when something is printed to stderr, jupyter shows it in a red box.

What you can do is:

import os
import sys
f = open(os.devnull, 'w')
sys.stderr = f

Note, that this sets stderr to /dev/null/, so, if you want to see it later, you need to save and restore previous value.

@mbalduini
Copy link
Author

Thank you for suggestion @mathbunnyru, but the proposed solution doesn't seem to work. No change in the output.

Any chance to specify redirection only for specific source?

@mathbunnyru
Copy link
Member

mathbunnyru commented Aug 9, 2021

Could you please make your question reproducible by other people?
No one wants to copy-paste the code from the screenshot.
Also, please tell us why you downgraded java and how you did it.

@mbalduini
Copy link
Author

Got it, you are right.

  • I downgraded to java 8 in order to avoid additional warning (find below):
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/spark-3.1.2-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
  • I downgraded by installing the open jdk version sudo apt-get install openjdk-8-jre and then selecting the version via sudo update-alternatives --config java command

  • Here the code I used to create the spark session with additional packages:

from pyspark.sql import SparkSession

spark_jars = "org.apache.hadoop:hadoop-aws:3.2.0,org.postgresql:postgresql:42.2.18,org.apache.spark:spark-avro_2.12:3.0.1,org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.5,org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1,org.apache.kafka:kafka-clients:2.6.0,com.databricks:spark-xml_2.12:0.12.0"

spark = (SparkSession.builder
    .master("local[*]")
    .appName("test-edu")
    .config("spark.jars.packages", spark_jars)
    .getOrCreate()
        )

spark
  • Here below the ivy print:
Ivy Default Cache set to: /home/jovyan/.ivy2/cache
The jars for the packages stored in: /home/jovyan/.ivy2/jars
org.apache.hadoop#hadoop-aws added as a dependency
org.postgresql#postgresql added as a dependency
org.apache.spark#spark-avro_2.12 added as a dependency
org.apache.spark#spark-streaming-kafka-0-10_2.11 added as a dependency
org.apache.spark#spark-sql-kafka-0-10_2.12 added as a dependency
org.apache.kafka#kafka-clients added as a dependency
com.databricks#spark-xml_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-b38a7466-c91b-4810-b1f8-64ad6781a4d4;1.0
	confs: [default]
	found org.apache.hadoop#hadoop-aws;3.2.0 in central
	found com.amazonaws#aws-java-sdk-bundle;1.11.375 in central
	found org.postgresql#postgresql;42.2.18 in central
	found org.checkerframework#checker-qual;3.5.0 in central
	found org.apache.spark#spark-avro_2.12;3.0.1 in central
	found org.spark-project.spark#unused;1.0.0 in central
	found org.apache.spark#spark-streaming-kafka-0-10_2.11;2.4.5 in central
	found org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.1 in central
	found org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.1 in central
	found org.apache.commons#commons-pool2;2.6.2 in central
	found org.apache.kafka#kafka-clients;2.6.0 in central
	found com.github.luben#zstd-jni;1.4.4-7 in central
	found org.lz4#lz4-java;1.7.1 in central
	found org.xerial.snappy#snappy-java;1.1.7.3 in central
	found org.slf4j#slf4j-api;1.7.30 in central
	found com.databricks#spark-xml_2.12;0.12.0 in central
	found commons-io#commons-io;2.8.0 in central
	found org.glassfish.jaxb#txw2;2.3.3 in central
	found org.apache.ws.xmlschema#xmlschema-core;2.2.5 in central
:: resolution report :: resolve 491ms :: artifacts dl 13ms
	:: modules in use:
	com.amazonaws#aws-java-sdk-bundle;1.11.375 from central in [default]
	com.databricks#spark-xml_2.12;0.12.0 from central in [default]
	com.github.luben#zstd-jni;1.4.4-7 from central in [default]
	commons-io#commons-io;2.8.0 from central in [default]
	org.apache.commons#commons-pool2;2.6.2 from central in [default]
	org.apache.hadoop#hadoop-aws;3.2.0 from central in [default]
	org.apache.kafka#kafka-clients;2.6.0 from central in [default]
	org.apache.spark#spark-avro_2.12;3.0.1 from central in [default]
	org.apache.spark#spark-sql-kafka-0-10_2.12;3.0.1 from central in [default]
	org.apache.spark#spark-streaming-kafka-0-10_2.11;2.4.5 from central in [default]
	org.apache.spark#spark-token-provider-kafka-0-10_2.12;3.0.1 from central in [default]
	org.apache.ws.xmlschema#xmlschema-core;2.2.5 from central in [default]
	org.checkerframework#checker-qual;3.5.0 from central in [default]
	org.glassfish.jaxb#txw2;2.3.3 from central in [default]
	org.lz4#lz4-java;1.7.1 from central in [default]
	org.postgresql#postgresql;42.2.18 from central in [default]
	org.slf4j#slf4j-api;1.7.30 from central in [default]
	org.spark-project.spark#unused;1.0.0 from central in [default]
	org.xerial.snappy#snappy-java;1.1.7.3 from central in [default]
	:: evicted modules:
	org.apache.kafka#kafka-clients;2.0.0 by [org.apache.kafka#kafka-clients;2.6.0] in [default]
	org.apache.kafka#kafka-clients;2.4.1 by [org.apache.kafka#kafka-clients;2.6.0] in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	---------------------------------------------------------------------
	|      default     |   21  |   0   |   0   |   2   ||   19  |   0   |
	---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-b38a7466-c91b-4810-b1f8-64ad6781a4d4
	confs: [default]
	0 artifacts copied, 19 already retrieved (0kB/26ms)
21/08/09 12:19:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

The console outputs related to Ivy operations appear with both the java version

@mbalduini
Copy link
Author

Any update on this issue?

I tested the code with the most recent version and the behaviour persists

@mathbunnyru
Copy link
Member

@mbalduini I've tried several solutions, but I didn't find something working, I think you need to somehow configure pyspark.sql logger (I haven't used pyspark, that's why I can't help you).

@mathbunnyru
Copy link
Member

Another option would be to somehow configure jupyterlab / jupyterlab cell to not to show stderr (or sth like this).
I don't know if it's easily possible.

@mbalduini
Copy link
Author

Hi @mathbunnyru thank you for your effort.
Unfortunately I tried several option too but no success yet, even with the latest release.

Do you have any further information or suggestions to cope with this problem?

@romainx
Copy link
Collaborator

romainx commented Jan 6, 2022

Hello @mbalduini and @mathbunnyru,

I have looked into this problem in more depth. The modification of the notebook output comes from one of the changes made in the release 6.0.0 of ipykernel.

All outputs to stdout/stderr should now be captured, including subprocesses and output of compiled libraries (blas, lapack....). In notebook server, some outputs that would previously go to the notebooks logs will now both head to notebook logs and in notebooks outputs.

A subsequent fix provides a way to restore the previous behavior. The fix consists in disabling the capture new behavior by turning it off through the capture_fd_output flag, see the following comment for more detail -> ipython/ipykernel#795 (comment).

You can configure it by turning it off in your ipython profile.

# create a default profile
ipython profile create

Edit the file ~/.ipython/profile_default/ipython_kernel_config.py and add the following line.

c.IPKernelApp.capture_fd_output = False

That's it ! All the outputs from Java, Spark and Ivy will no more be displayed in the notebook but only in the logs.
We have to check if we could / should do something here to provide this configuration by default.
@mathbunnyru what is your opinion?

@romainx romainx added the tag:Upstream A problem with one of the upstream packages installed in the docker images label Jan 6, 2022
@mathbunnyru
Copy link
Member

@romainx nice!

I think we can try to add this file to pyspark image (it will also be included in all-spark).
I think these logs are noisy and everyone using spark sees them and they don't make a lot of sense, if everything goes right.

@romainx
Copy link
Collaborator

romainx commented Jan 7, 2022

@mathbunnyru 👍
And in fact they still appear in the container logs even after this change.
I will draft a PR for that (this will be the opportunity to start my contributions here again 😄).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag:Upstream A problem with one of the upstream packages installed in the docker images type:Bug A problem with the definition of one of the docker images maintained here
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants