Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClassNotFoundException can be thrown for classes in nested jars when under GC pressure #31853

Closed
ayashkov opened this issue Jul 24, 2022 · 13 comments
Assignees
Labels
type: regression A regression from a previous release
Milestone

Comments

@ayashkov
Copy link

ayashkov commented Jul 24, 2022

Overview

A very simple Spring Boot based application locks up on the very first HTTP request with java.lang.NoClassDefFoundError present in the logs. Killing the application also causes java.lang.NoClassDefFoundError in the logs.

The matrix below outlines when the locking happens by the version and the execution environment.

Spring Boot Linux Windows OpenShift
2.5.14 Works Works Works
2.6.8 Works Works Works
2.6.9+ Works Works Locks
2.7.0 Works Works Works
2.7.1+ Works Works Locks

Normal execution

Here is how a normal execution looks.

> oc run image-test --rm -it --image=registry/angular-starter:1.0.0-SNAPSHOT --image-pull-policy="Always" --command -- /bin/sh
If you don't see a command prompt, try pressing enter.
sh-4.4$ java -jar /app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar &
[1] 7
  .   ____          _            __ _ _
/\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot ::                (v2.6.8)
 
2022-07-24 17:11:13.850  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : Starting WebApp using Java 11.0.15 on image-test with PID 7 (/app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar started by ? in /home/app)
2022-07-24 17:11:13.853  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : No active profile set, falling back to 1 default profile: "default"
2022-07-24 17:11:19.854  INFO 7 --- [           main] o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port 8080
2022-07-24 17:11:19.863  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : Started WebApp in 7.61 seconds (JVM running for 9.215)
 
sh-4.4$ curl -v http://localhost:8080/api/status
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET /api/status HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: application/hal+json
< Content-Length: 86
< 
* Connection #0 to host localhost left intact
{"_links":{"self":{"href":"/api/status"}},"apiVersion":"1.0.0-SNAPSHOT","status":"UP"}
sh-4.4$ kill %1
[1]+  Done(143)               java -jar /app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar

Stuck execution

Here is how the stuck request looks. I had to press ^C to abort the curl run.

> oc run image-test --rm -it --image=registry/angular-starter:1.0.0-SNAPSHOT --image-pull-policy="Always" --command -- /bin/sh
If you don't see a command prompt, try pressing enter.
sh-4.4$ java -jar /app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar &
[1] 7
  .   ____          _            __ _ _
/\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot ::                (v2.6.9)
 
2022-07-24 16:53:02.884  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : Starting WebApp using Java 11.0.15 on image-test with PID 7 (/app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar started by ? in /home/app)
2022-07-24 16:53:02.887  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : No active profile set, falling back to 1 default profile: "default"
2022-07-24 16:53:09.586  INFO 7 --- [           main] o.s.b.web.embedded.netty.NettyWebServer  : Netty started on port 8080
2022-07-24 16:53:09.679  INFO 7 --- [           main] com.baml.starter.angular.web.WebApp      : Started WebApp in 8.394 seconds (JVM running for 9.951)
 
sh-4.4$ curl -v http://localhost:8080/api/status
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET /api/status HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.61.1
> Accept: */*
> 
Exception in thread "reactor-http-epoll-2" java.lang.NoClassDefFoundError: io/netty/buffer/PoolArena$1
        at io.netty.buffer.PoolArena.freeChunk(PoolArena.java:248)
        at io.netty.buffer.PoolThreadCache$MemoryRegionCache.freeEntry(PoolThreadCache.java:432)
        at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:396)
        at io.netty.buffer.PoolThreadCache$MemoryRegionCache.free(PoolThreadCache.java:388)
        at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:254)
        at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:245)
        at io.netty.buffer.PoolThreadCache.free(PoolThreadCache.java:218)
        at io.netty.buffer.PooledByteBufAllocator$PoolThreadLocalCache.onRemoval(PooledByteBufAllocator.java:542)
        at io.netty.buffer.PooledByteBufAllocator$PoolThreadLocalCache.onRemoval(PooledByteBufAllocator.java:503)
        at io.netty.util.concurrent.FastThreadLocal.remove(FastThreadLocal.java:257)
        at io.netty.util.concurrent.FastThreadLocal.removeAll(FastThreadLocal.java:67)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:1050)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: io.netty.buffer.PoolArena$1
        at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
        at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:151)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
        ... 15 more
^C
sh-4.4$ kill %1
Exception in thread "SpringApplicationShutdownHook" java.lang.NoClassDefFoundError: ch/qos/logback/classic/spi/ThrowableProxy
        at ch.qos.logback.classic.spi.LoggingEvent.<init>(LoggingEvent.java:119)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:419)
        at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:383)
        at ch.qos.logback.classic.Logger.log(Logger.java:765)
        at org.apache.commons.logging.LogAdapter$Slf4jLocationAwareLog.warn(LogAdapter.java:447)
        at org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:1070)
        at org.springframework.boot.web.reactive.context.ReactiveWebServerApplicationContext.doClose(ReactiveWebServerApplicationContext.java:147)
        at org.springframework.context.support.AbstractApplicationContext.close(AbstractApplicationContext.java:1021)
        at org.springframework.boot.SpringApplicationShutdownHook.closeAndWait(SpringApplicationShutdownHook.java:145)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at org.springframework.boot.SpringApplicationShutdownHook.run(SpringApplicationShutdownHook.java:114)
        at java.base/java.lang.Thread.run(Thread.java:829)
 
[1]+  Done(143)               java -jar /app/lib/starter-angular-app-1.0.0-SNAPSHOT-exec.jar

This particular application uses WebFlux to serve the request. I tried to convert it to WebMVC. It also gets stuck with java.lang.NoClassDefFoundError albeit in a different class.

I did a quick diff between 2.6.8 and 2.6.9. I see some changes in org.springframework.boot.loader.jar package that may be relevant here. I'm referring to this commit. For what it's worth.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jul 24, 2022
@wilkinsona
Copy link
Member

wilkinsona commented Jul 25, 2022

Thanks for the report. Unfortunately, I think we'll need to be able to reproduce the problem, ideally without running on Openshift, to diagnose it. Can you please provide a complete yet minimal sample that does so? Unless this is a problem that's specific to OpenShift, it should be possible to reproduce the problem by running the same image locally.

@wilkinsona wilkinsona added the status: waiting-for-feedback We need additional information before we can continue label Jul 25, 2022
@ayashkov
Copy link
Author

I'll see what I can do. This may take a few days due to the fact that in my workplace the source code can only move in one direction: in.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 25, 2022
@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jul 25, 2022
@MrWong99
Copy link

Had the same issue after upgrading from 2.6.x to 2.7.2. After I downgraded to 2.7.0 it works fine again.
I am however not running OpenShift but instead only on Docker (20.10.17).

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 27, 2022
@snicoll snicoll added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jul 27, 2022
@snicoll
Copy link
Member

snicoll commented Jul 27, 2022

@MrWong99 if you can share a sample that we can run that would be much appreciated.

@MrWong99
Copy link

MrWong99 commented Jul 27, 2022

I tried to extract the crucial parts without leaking any company code. It is however not tested in that format.
Our app has a REST interface, includes a static website and has a small websocket endpoint.

pom.xml

...
    <properties>
        <springboot.version>2.7.2</springboot.version>
    </properties>

    <dependencyManagement>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-dependencies</artifactId>
                <version>${springboot.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-logging</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-starter-vault-config</artifactId>
            <version>3.1.1</version>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-websocket</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-jdbc</artifactId>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <version>${springboot.version}</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>repackage</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <mainClass>my.main.Application</mainClass>
                    <layout>ZIP</layout>
                </configuration>
            </plugin>
        </plugins>
    </build>

Dockerfile

# Note: we don't use this image but a similar one
FROM adoptopenjdk:11.0.11_9-jdk-hotspot

ADD my-distribution-containing-app.tar.gz .
EXPOSE 8080

CMD [ "java", "-Xmx500m", "-Dloader.path=//custom/plugins/", "-jar", "app.jar", "--spring.config.import=application.yml" ]

The application.yml has nothing special but some settings for spring.cloud.vault and active profiles.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 27, 2022
@bclozel
Copy link
Member

bclozel commented Jul 27, 2022

@MrWong99 We're going to need an actual sample for this. I don't see anything that could cause that behavior in the code snippets you've shared.

Starting from a sample application created from start.spring.io and adding relevant bits until you reproduce the problem should give us the information we need. Thanks!

@bclozel bclozel added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jul 27, 2022
@ayashkov
Copy link
Author

I'm doing essentially this: https://github.com/ayashkov/spring-boot-31853. Several things though:

  1. I could not make this specific code fail. Still trying to modify it to trigger the failure.
  2. The whole thing seems to be extremely sensitive to the composition of the final JAR. E.g. when I try to make certain changes to the code that fails, e.g. consolidating or renaming modules, removing HAL processing, it stops failing.

I will keep working on it for a few days more, hopefully with better outcome.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 27, 2022
@bclozel bclozel added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jul 27, 2022
@ayashkov
Copy link
Author

Here is an additional piece of data that may be relevant here. I was able to stop the breakage of the original code by removing <layout>ZIP</layout> in the following spring-boot-maven-plugin plugin configuration:

<configuration>
    <mainClass>com.baml.starter.angular.web.WebApp</mainClass>
    <layout>ZIP</layout>
    <classifier>exec</classifier>
</configuration>

In other words, the following works just fine:

<configuration>
    <mainClass>com.baml.starter.angular.web.WebApp</mainClass>
    <classifier>exec</classifier>
</configuration>

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 27, 2022
@wilkinsona wilkinsona added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jul 27, 2022
@ayashkov
Copy link
Author

The issue is reproducible in Docker. Please see https://github.com/ayashkov/spring-boot-31853.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jul 27, 2022
@snicoll snicoll changed the title Application locks up with java.lang.NoClassDefFoundError in container environment (OpenShift 4.x) Application locks up with java.lang.NoClassDefFoundError with PropertiesLauncher Jul 29, 2022
@snicoll snicoll added type: regression A regression from a previous release and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Jul 29, 2022
@snicoll snicoll added this to the 2.6.11 milestone Jul 29, 2022
@snicoll snicoll modified the milestones: 2.6.11, 2.6.x Aug 10, 2022
@philwebb philwebb self-assigned this Aug 15, 2022
@philwebb
Copy link
Member

Digging into this a bit today and the fix for #29356 has caused the regression. The NoClassDefFoundError is actually caused by JarFile.ensureOpen() throwing an IllegalStateException.

It looks like java.util.zip.ZipFile$CleanableResource$FinalizableResource is actually closing the Jar which makes things quite hard to debug. I'm wondering if calling the close() on all the nestedJars is actually correct. It seems likely to me that an outer JarFile might be GC'd before the nested one is finished with.

@philwebb philwebb changed the title Application locks up with java.lang.NoClassDefFoundError with PropertiesLauncher ClassNotFoundException can be thrown for classes in nested jars when under GC pressure Aug 16, 2022
@philwebb
Copy link
Member

@ayashkov Thanks so much for putting the sample application together, it was really helpful!

@philwebb
Copy link
Member

I've pushed something that I hope will fix this but I'm not totally confident. If it doesn't work we may well need to revert the fix for #29356.

We're planning a release on Thursday so for anyone watching this issue, once CI build 1110 finishes, please try the latest SNAPSHOT to see if you still have the same problem.

@philwebb philwebb modified the milestones: 2.6.x, 2.6.11 Aug 16, 2022
@philwebb philwebb reopened this Aug 17, 2022
@philwebb
Copy link
Member

Unfortunately we're now facing file locking issues on Windows again. I'm going to revert the fix for #29356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: regression A regression from a previous release
Projects
None yet
Development

No branches or pull requests

7 participants