-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jetty 12.0.6: NPE in error handling leading to 100% cpu usage by Jetty threads #11431
Comments
We believe we have a fix for #11326 that will be in the 12.0.7 release at the end of the month. The ISE is also something that we have been addressing, but it has mostly been a harmless second call to So it may be that 12.0.7 will have fixed these, but I'll have a bit more of a look at this area before the release and at the very least put in a bit better handling of errors whilst handling errors. For both of those exceptions it would be good to see the actual error that was being handled. |
@gregw what about the NPE in org.eclipse.jetty.http.HttpField.getHeader() during a ServletContextResponse.resetContent call? |
@joakime yeah, that NPE should not happen with normal handling of a Mutable HttpFields class. So it is plausible that this is again double handling of an error, so two threads are modifying the HttpFields at once. The null kind of suggests one thread is adding a field, whilst the other is trying to clear it. So I do suspect the error handling that we have already cleaned up somewhat, but I can't yet say for sure this is fixed. @konstantin-mikheev can you tell us more about the actual failure. I.e. details about what you mean with "infinite stream of messages". Plus next time it happens, can you take a full thread dump. |
Ugh! The NPE is an exception thrown whilst handling an exception thrown whilst handling an exception! It is turtles all the way down! |
"infinite stream of messages" meant we got 100k identical NPE messages in logs per minute for the next few minutes. Node had died and was restarted so no thread dump unfortunately. During this timeframe besides those 100k per minute NPEs we got 6 IllegalStateExceptions from I have looked into ploss statistics by location and found out at the last case there was no changes in ploss rate when incident started. Out of about 30 locations there was one with about 20% ploss which was already going for approximately 50 minutes. There were couple of ploss spikes to other locations within 30 minutes before NPEs happened but they were not happening at the time when NPEs started to appear. For now I have ported project back to Jetty 11. I will probably left one instance with Jetty 12 to try to obtain thread dump but could not promise anything unfortunately. What also could I add: I have looked into cpu usage distribution and there is an increase from 5 to 20% of the "kernel mode" cpu usage for couple of minutes right when those issues started. I will investigate more what could it be. |
Jetty version(s)
12.0.6, 12.0.5
Jetty Environment
ee10
Java version/vendor
(use: java -version)
OpenJDK Runtime Environment Corretto-21.0.0.35.1 (build 21+35-LTS)
OS type/version
Rocky Linux release 8.8 (Green Obsidian)
Description
We are using embedded Jetty in Spring Boot 3.2.2
After migrating to Jetty 12 we got this error leading to the infinite stream of messages, full gc and node being not able to process traffic. Happened in a few DCs from one day to one week after deploy. Seems similar in appearance to #11326
Initially we got it on 12.0.5 and after reading 12.0.6 release notes forced 12.0.6 but still got the same issue.
Stack trace from 12.0.6
Judging by logs it was usually happening in one thread (named like qtp1047556034-2613) and looks like some
while(true)
kind of log. We got a few gigabytes of those messages within couple of minutes.What we also noticed was the appearance of additional exceptions after migration not always leading to any consequences but those precursor exceptions always has happened just before this NPE
We have not got those in Jetty 11 at all and we got much less on 12.0.6 compared to 12.0.5. On 12.0.5 it was occurring almost always just after launch but on 12.0.6 you could wait couple of hour for it to arrive.
How to reproduce?
Unfortunately we have not managed to reproduce it yet. It was not happening on test environment and on DCs with good network connectivity
Usually it coincides with packet loss towards the DC where it happens and it happens on all instances within DC simultaneously. At the time usually the only requests handled by jetty were healthchecks. Basically there are a lot (hundreds) of tiny requests disturbed by packet loss.
We needed to wait from one day to about a week after deploy for the issue to happen.
The text was updated successfully, but these errors were encountered: