Jetty 12.0.6: NPE in error handling leading to 100% cpu usage by Jetty threads #11431

konstantin-mikheev · 2024-02-21T15:11:39Z

Jetty version(s)
12.0.6, 12.0.5

Jetty Environment
ee10

Java version/vendor (use: java -version)
OpenJDK Runtime Environment Corretto-21.0.0.35.1 (build 21+35-LTS)

OS type/version
Rocky Linux release 8.8 (Green Obsidian)

Description
We are using embedded Jetty in Spring Boot 3.2.2

After migrating to Jetty 12 we got this error leading to the infinite stream of messages, full gc and node being not able to process traffic. Happened in a few DCs from one day to one week after deploy. Seems similar in appearance to #11326

Initially we got it on 12.0.5 and after reading 12.0.6 release notes forced 12.0.6 but still got the same issue.

Stack trace from 12.0.6

/status 
java.lang.NullPointerException: Cannot invoke "org.eclipse.jetty.http.HttpField.getHeader()" because "f" is null
	at org.eclipse.jetty.http.HttpFields$Mutable.remove(HttpFields.java:1418) ~[jetty-http-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.ee10.servlet.ServletContextResponse.resetContent(ServletContextResponse.java:385) ~[jetty-ee10-servlet-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:509) ~[jetty-ee10-servlet-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1204) ~[jetty-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1197) ~[jetty-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.ee10.servlet.ServletChannelState.runInContext(ServletChannelState.java:1254) ~[jetty-ee10-servlet-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.ee10.servlet.ServletChannelState.asyncError(ServletChannelState.java:818) ~[jetty-ee10-servlet-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.accept(ContextHandler.java:1185) ~[jetty-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.server.handler.ContextRequest.lambda$addFailureListener$1(ContextRequest.java:53) ~[jetty-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.server.internal.HttpChannelState.lambda$onFailure$2(HttpChannelState.java:445) ~[jetty-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.SerializedInvoker$Link.run(SerializedInvoker.java:191) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.http2.server.internal.HttpStreamOverHTTP2.lambda$onFailure$2(HttpStreamOverHTTP2.java:590) ~[jetty-http2-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:410) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:971) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1201) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1156) ~[jetty-util-12.0.6.jar:12.0.6]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]

Judging by logs it was usually happening in one thread (named like qtp1047556034-2613) and looks like some while(true) kind of log. We got a few gigabytes of those messages within couple of minutes.

What we also noticed was the appearance of additional exceptions after migration not always leading to any consequences but those precursor exceptions always has happened just before this NPE

Serialized invocation error
java.lang.IllegalStateException: s=IDLE rs=COMPLETED os=ABORTED is=IDLE awp=false se=false i=false al=2
	at org.eclipse.jetty.ee10.servlet.ServletChannelState.completed(ServletChannelState.java:1052) ~[jetty-ee10-servlet-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.Callback$3.failed(Callback.java:168) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.ee10.servlet.HttpOutput.onWriteComplete(HttpOutput.java:252) ~[jetty-ee10-servlet-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.ee10.servlet.HttpOutput$WriteCompleteCB.failed(HttpOutput.java:1760) ~[jetty-ee10-servlet-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.ee10.servlet.HttpOutput$CompleteWriteCompleteCB.failed(HttpOutput.java:1777) ~[jetty-ee10-servlet-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.accept(ContextHandler.java:1185) ~[jetty-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.server.handler.ContextResponse$1.failed(ContextResponse.java:47) ~[jetty-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.Callback$6.failed(Callback.java:277) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.server.internal.HttpChannelState$ChannelResponse.lambda$lockedFailWrite$0(HttpChannelState.java:1111) ~[jetty-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.SerializedInvoker$Link.run(SerializedInvoker.java:191) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.Invocable$1.run(Invocable.java:230) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.http2.server.internal.HttpStreamOverHTTP2.lambda$onFailure$2(HttpStreamOverHTTP2.java:590) ~[jetty-http2-server-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:971) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1201) ~[jetty-util-12.0.6.jar:12.0.6]
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1156) ~[jetty-util-12.0.6.jar:12.0.6]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]

We have not got those in Jetty 11 at all and we got much less on 12.0.6 compared to 12.0.5. On 12.0.5 it was occurring almost always just after launch but on 12.0.6 you could wait couple of hour for it to arrive.

How to reproduce?
Unfortunately we have not managed to reproduce it yet. It was not happening on test environment and on DCs with good network connectivity

Usually it coincides with packet loss towards the DC where it happens and it happens on all instances within DC simultaneously. At the time usually the only requests handled by jetty were healthchecks. Basically there are a lot (hundreds) of tiny requests disturbed by packet loss.

We needed to wait from one day to about a week after deploy for the issue to happen.

The text was updated successfully, but these errors were encountered:

gregw · 2024-02-21T22:27:23Z

We believe we have a fix for #11326 that will be in the 12.0.7 release at the end of the month.

The ISE is also something that we have been addressing, but it has mostly been a harmless second call to callback.failed() after a callback has already been succeeded or failed.
The NPE is also a problem in the error handling, which also looks like something has already cleared the headers as we are trying to write out an error.

So it may be that 12.0.7 will have fixed these, but I'll have a bit more of a look at this area before the release and at the very least put in a bit better handling of errors whilst handling errors. For both of those exceptions it would be good to see the actual error that was being handled.

joakime · 2024-02-21T22:38:58Z

@gregw what about the NPE in org.eclipse.jetty.http.HttpField.getHeader() during a ServletContextResponse.resetContent call?

gregw · 2024-02-22T07:44:01Z

@joakime yeah, that NPE should not happen with normal handling of a Mutable HttpFields class. So it is plausible that this is again double handling of an error, so two threads are modifying the HttpFields at once. The null kind of suggests one thread is adding a field, whilst the other is trying to clear it.

So I do suspect the error handling that we have already cleaned up somewhat, but I can't yet say for sure this is fixed.

@konstantin-mikheev can you tell us more about the actual failure. I.e. details about what you mean with "infinite stream of messages". Plus next time it happens, can you take a full thread dump.

gregw · 2024-02-22T07:49:52Z

Ugh! The NPE is an exception thrown whilst handling an exception thrown whilst handling an exception! It is turtles all the way down!

konstantin-mikheev · 2024-02-22T15:53:33Z

"infinite stream of messages" meant we got 100k identical NPE messages in logs per minute for the next few minutes. Node had died and was restarted so no thread dump unfortunately.

During this timeframe besides those 100k per minute NPEs we got 6 IllegalStateExceptions from org.eclipse.jetty.ee10.servlet.ServletChannelState. Besides those there are only 34 other messages all of those not really interesting: couple of timeouts towards other locations and retries while trying to update some settings. Usually those logs mean packetloss.

I have looked into ploss statistics by location and found out at the last case there was no changes in ploss rate when incident started. Out of about 30 locations there was one with about 20% ploss which was already going for approximately 50 minutes. There were couple of ploss spikes to other locations within 30 minutes before NPEs happened but they were not happening at the time when NPEs started to appear.

For now I have ported project back to Jetty 11. I will probably left one instance with Jetty 12 to try to obtain thread dump but could not promise anything unfortunately.

What also could I add: I have looked into cpu usage distribution and there is an increase from 5 to 20% of the "kernel mode" cpu usage for couple of minutes right when those issues started. I will investigate more what could it be.

konstantin-mikheev added the Bug For general bugs on Jetty side label Feb 21, 2024

joakime added this to Jetty 12.0.7 - FROZEN Feb 21, 2024

gregw self-assigned this Feb 21, 2024

joakime moved this to 🏗 In progress in Jetty 12.0.7 - FROZEN Feb 21, 2024

gregw mentioned this issue Feb 22, 2024

Add suppressed failures in Callback failed #11435

Merged

joakime linked a pull request Feb 23, 2024 that will close this issue

Add suppressed failures in Callback failed #11435

Merged

sbordet closed this as completed in #11435 Feb 29, 2024

github-project-automation bot moved this from 🏗 In progress to ✅ Done in Jetty 12.0.7 - FROZEN Feb 29, 2024

konstantin-mikheev mentioned this issue Apr 8, 2024

NPE in error handling leading to 100% CPU #11631

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jetty 12.0.6: NPE in error handling leading to 100% cpu usage by Jetty threads #11431

Jetty 12.0.6: NPE in error handling leading to 100% cpu usage by Jetty threads #11431

konstantin-mikheev commented Feb 21, 2024 •

edited

Loading

gregw commented Feb 21, 2024

joakime commented Feb 21, 2024

gregw commented Feb 22, 2024

gregw commented Feb 22, 2024

konstantin-mikheev commented Feb 22, 2024 •

edited

Loading

Jetty 12.0.6: NPE in error handling leading to 100% cpu usage by Jetty threads #11431

Jetty 12.0.6: NPE in error handling leading to 100% cpu usage by Jetty threads #11431

Comments

konstantin-mikheev commented Feb 21, 2024 • edited Loading

gregw commented Feb 21, 2024

joakime commented Feb 21, 2024

gregw commented Feb 22, 2024

gregw commented Feb 22, 2024

konstantin-mikheev commented Feb 22, 2024 • edited Loading

konstantin-mikheev commented Feb 21, 2024 •

edited

Loading

konstantin-mikheev commented Feb 22, 2024 •

edited

Loading