Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on some VM Linux systems #192

Closed
Horcrux7 opened this issue Mar 21, 2022 · 12 comments
Closed

Crash on some VM Linux systems #192

Horcrux7 opened this issue Mar 21, 2022 · 12 comments

Comments

@Horcrux7
Copy link
Contributor

On some combinations of Linux and Java VM we have a crash of the Java VM with imageio-openjpeg. For example on Ubuntu 21 and Java 11 or Alpine/Docker with Java 17. Any idea for the cause?

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  com.kenai.jffi.Foreign.invokeN3O1(JJJJJLjava/lang/Object;III)J+0
j  com.kenai.jffi.Invoker.invokeN3(Lcom/kenai/jffi/CallContext;JJJJILjava/lang/Object;Lcom/kenai/jffi/ObjectParameterStrategy;Lcom/kenai/jffi/ObjectParameterInfo;Ljava/lang/Object;Lcom/kenai/jffi/ObjectParameterStrategy;Lcom/kenai/jffi/ObjectParameterInfo;Ljava/lang/Object;Lcom/kenai/jffi/ObjectParameterStrategy;Lcom/kenai/jffi/ObjectParameterInfo;)J+126
j  de.digitalcollections.openjpeg.lib.libopenjp2$jnr$ffi$0.opj_read_header(Ljnr/ffi/Pointer;Ljnr/ffi/Pointer;Ljnr/ffi/byref/PointerByReference;)Z+190
j  de.digitalcollections.openjpeg.OpenJpeg.getImage(Ljnr/ffi/Pointer;Ljnr/ffi/Pointer;)Lde/digitalcollections/openjpeg/lib/structs/opj_image;+32
j  de.digitalcollections.openjpeg.OpenJpeg.getInfo(Ljnr/ffi/Pointer;)Lde/digitalcollections/openjpeg/Info;+13
j  de.digitalcollections.openjpeg.OpenJpeg.getInfo(Lde/digitalcollections/openjpeg/InStreamWrapper;)Lde/digitalcollections/openjpeg/Info;+5
j  de.digitalcollections.openjpeg.imageio.OpenJp2ImageReader.getInfo()Lde/digitalcollections/openjpeg/Info;+16
j  de.digitalcollections.openjpeg.imageio.OpenJp2ImageReader.checkIndex(I)V+2
j  de.digitalcollections.openjpeg.imageio.OpenJp2ImageReader.read(ILjavax/imageio/ImageReadParam;)Ljava/awt/image/BufferedImage;+2
j  javax.imageio.ImageIO.read(Ljavax/imageio/stream/ImageInputStream;)Ljava/awt/image/BufferedImage;+56 [email protected]
j  javax.imageio.ImageIO.read(Ljava/io/InputStream;)Ljava/awt/image/BufferedImage;+35 [email protected]
j  com.inet.jpeg2000.Jpeg2000ServerPlugin.a(Lcom/inet/plugin/ServerPluginManager;)V+366

hs_err_pid165.log

@jbaiter
Copy link
Member

jbaiter commented Mar 22, 2022

Can you get your hands on a core dump and extract a backtrace for the parts beyond the FFI where the crash happens?

@Horcrux7
Copy link
Contributor Author

No, there is no apport tool in the docker container. That we does have no core dump yet.

@gamma
Copy link

gamma commented Mar 23, 2022

We have created a sample repository, also containing a sample core dump:

https://github.com/gamma/temurin-jvm-docker-crash-sample/blob/main/core-dump.tar.gz

@jbaiter
Copy link
Member

jbaiter commented Mar 23, 2022

Brilliant, thank you, I'll investigate!

@jbaiter
Copy link
Member

jbaiter commented Mar 23, 2022

OK, so I could not reproduce the crash on my machine (Debian unstable, so pretty close to Ubuntu 21, OpenJDK 17.0.2 2022-01-18, x86_64). I also tried with OpenJDK 11.0.14 2022-01-18, also no crash.
Can you provide more details on your two environments, or maybe even a minimal Dockerfile for either of the two in the reproduction repo?

@gamma
Copy link

gamma commented Mar 23, 2022

You can use the stock eclipse-temurin:17-sdk-alpine image to reproduce the issue.

@gamma
Copy link

gamma commented Mar 23, 2022

We also added an issue with the JDK here: adoptium/adoptium-support#477

Maybe there is additional valuable information for you.

@jbaiter
Copy link
Member

jbaiter commented Mar 23, 2022

So I was able to get a bit further with this:

Here's the backtrace from gdb that shows the genesis:
image

Maybe this has something to with the musl libc? Can you provide more information on your Ubuntu setup so I can try to reproduce it there?

@gamma
Copy link

gamma commented Mar 23, 2022

Thanks for the info so far. We'll check that tomorrow and get back to you. The binary libopenjp was not specifically linked with musl afaik.

The original Linux one works with the adoptopenjdk 12 alpine stock image afaik. I already thought about that - and we tried with a custom glibc build which did not work. I can check for compiling libopenjp2 with musl tomorrow as well (or maybe there is one in the package repos for alpine)

@jbaiter
Copy link
Member

jbaiter commented Mar 24, 2022

Bingo, I just ran it inside the Docker container mentioned above with the libopenjp2 from the Alpine repository that was linked specifically against libmusl and the test runs without a problem.
I think the issue is that the libopenjp2 was built against glibc and then relocated to musl, ldd prints some warnings as well:

/app # ldd /app/openjpeg/linux/libopenjp2.so.7
        /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
        libm.so.6 => /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
        libpthread.so.0 => /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
        libc.so.6 => /lib/ld-musl-x86_64.so.1 (0x7faa6ecb4000)
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __vsnprintf_chk: symbol not found
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __pow_finite: symbol not found
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __fprintf_chk: symbol not found
Error relocating /app/openjpeg/linux/libopenjp2.so.7: __sprintf_chk: symbol not found

---- edit:

Yep, that seems to be the case:

/app # nm -D openjpeg/linux/libopenjp2.so.7 |grep GLIBC
                 w __cxa_finalize@GLIBC_2.2.5
                 U __fprintf_chk@GLIBC_2.3.4
                 U __pow_finite@GLIBC_2.15
                 U __sprintf_chk@GLIBC_2.3.4
                 U __stack_chk_fail@GLIBC_2.4
                 U __vsnprintf_chk@GLIBC_2.3.4
                 U calloc@GLIBC_2.2.5
                 U fclose@GLIBC_2.2.5
                 U fopen@GLIBC_2.2.5
                 U fputc@GLIBC_2.2.5
                 U fread@GLIBC_2.2.5
                 U free@GLIBC_2.2.5
                 U fseeko@GLIBC_2.2.5
                 U ftello@GLIBC_2.2.5
                 U fwrite@GLIBC_2.2.5
                 U getenv@GLIBC_2.2.5
                 U getrusage@GLIBC_2.2.5
                 U malloc@GLIBC_2.2.5
                 U memcpy@GLIBC_2.14
                 U memmove@GLIBC_2.2.5
                 U memset@GLIBC_2.2.5
                 U posix_memalign@GLIBC_2.2.5
                 U pthread_attr_init@GLIBC_2.2.5
                 U pthread_attr_setdetachstate@GLIBC_2.2.5
                 U pthread_cond_destroy@GLIBC_2.3.2
                 U pthread_cond_init@GLIBC_2.3.2
                 U pthread_cond_signal@GLIBC_2.3.2
                 U pthread_cond_wait@GLIBC_2.3.2
                 U pthread_create@GLIBC_2.2.5
                 U pthread_join@GLIBC_2.2.5
                 U pthread_mutex_destroy@GLIBC_2.2.5
                 U pthread_mutex_init@GLIBC_2.2.5
                 U pthread_mutex_lock@GLIBC_2.2.5
                 U pthread_mutex_unlock@GLIBC_2.2.5
                 U realloc@GLIBC_2.2.5
                 U stdout@GLIBC_2.2.5
                 U strcpy@GLIBC_2.2.5
                 U strlen@GLIBC_2.2.5
                 U strtol@GLIBC_2.2.5
                 U sysconf@GLIBC_2.2.5

---- edit:

The plot thickens, recall the ldd warrning about failing to relocate various __*printf_chk symbols? Guess what opj_event_msg calls:

    if ((fmt != 00) && (p_event_mgr != 00)) {
        va_list arg;
        char message[OPJ_MSG_SIZE];
        memset(message, 0, OPJ_MSG_SIZE);
        /* initialize the optional parameter list */
        va_start(arg, fmt);
        /* parse the format string and put the result in 'message' */
        vsnprintf(message, OPJ_MSG_SIZE, fmt, arg);  // 💣💣💣
        /* force zero termination for Windows _vsnprintf() of old MSVC */
        message[OPJ_MSG_SIZE - 1] = '\0';
        /* deinitialize the optional parameter list */
        va_end(arg);

        /* output the message to the user program */
        msg_handler(message, l_data);
    }

https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/event.c#L128-L129

@gamma
Copy link

gamma commented Mar 24, 2022

Sweet. Good catch. I did not check the libopenjp2 dependencies - just some others. That effectively means that we have to use a different openjp2 lib (will check that right away) or some obscure way to have glibc present - which is possible afaik.

@jbaiter
Copy link
Member

jbaiter commented Mar 24, 2022

Yes, I think the easiest way would be to rely on the distro-provided libopenjp2, or if that is not possible/desired, to ship a x86_64-unknown-linux-musl build in your JAR.
I'll close this issue since it's not a problem with the library itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants