Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Application deployment and AS startup takes more RAM (ever growing) and too long (more than twice the time it took on previous releases) #6554

Closed
elcodedocle opened this issue Feb 5, 2024 · 13 comments · Fixed by #6677 or #6713
Assignees
Labels
Status: Open Issue has been triaged by the front-line engineers and is being worked on verification

Comments

@elcodedocle
Copy link

elcodedocle commented Feb 5, 2024

Brief Summary

Since Payara release 6.2023.9 we observe the application deployment time is at least twice as much as it was in 6.2023.8 and earlier versions. This includes the latest 6.2024.1 release.

We have tried to trace this performance regression and there seems to be no obvious reason for it:

  • We see no extra errors or warnings on the logs before or while deploying the application ear.
  • Increasing the available resources does not help (It reduces the application deployment time, but it's still nowhere near previous releases).
  • We see a commit added after 2023.8 that could be causing this issue.
  • A simple reproducer does not reproduce the issue.

We need a mitigation or fix to prevent the upgrade to the latest Payara release from severely impacting our instance deployment times.

We want to know:

  • Is anyone else experiencing this issue?
  • Is there a documented way to optimise deployment times we may have missed to help us get 2023.8 and earlier deployment times in the latest Payara release for the affected application?

Expected Outcome

Application deployment time remains approximately the same after upgrading from Payara 6.2023.8 to latest.

Current Outcome

Application deployment time more than doubles after upgrading from Payara 6.2023.8 to latest. No reason given in the logs. We do not know which component is causing the delay.

Reproducer

I wrote a reproducer deploying pretty much every Jakarta component we use (JAX-RS, JPA, @startup EJBs) packed on a single ear, but it only helped to discard those components, at least on a basic level: https://github.com/elcodedocle/payara-sscce/tree/latest

We'd appreciate any pointers on how to extend it to reproduce (Perhaps running a profiler during the startup of the affected application would be more helpful, for example?)

Operating System

Debian bookworm

JDK Version

Debian openjdk 17 JRE (17.0.10+7-1deb12u1) / application built with target 11 on Debian openJDK 11.0.22+7-1deb11u1

Payara Distribution

Payara Server Full Profile

@elcodedocle elcodedocle added Status: Open Issue has been triaged by the front-line engineers and is being worked on verification Type: Bug Label issue as a bug defect labels Feb 5, 2024
@felixif
Copy link

felixif commented Feb 13, 2024

Hello @eclcodeedocle,

Thank you very much for submitting a bug report! Unfortunately, we are unable to assist in this particular case, due to two reasons: I am unable to run the reproducer provided (The following message from Maven when running the mvn clean verify command: Error assembling EJB: META-INF/ejb-jar.xml is required for ejbVersion 2.x.). The other reason is that performance issues are usually covered under the Payara Enterprise subscription.

We are still open to work on this bug report if you can provide a clear reproducer that showcases the performance decay between Payara Community 6.2023.8 and the latest Community release.

Thank you,
Felix Ifrim

@felixif felixif added Status: Pending Waiting on the issue requester to give more details or share a reproducer and removed Status: Open Issue has been triaged by the front-line engineers and is being worked on verification labels Feb 13, 2024
@elcodedocle
Copy link
Author

elcodedocle commented Feb 13, 2024

Hello @felixif

First of all, thank you for having a look at this.

Regarding the error you find on the reproducer I expect it is related to the maven version. I tested it on maven 3.9.5 which uses the default maven EJB plugin version 3.2.1 which by default should build you an EJB version 3.0 instead of the 2.x your execution seems to be attempting. To make this requirement explicit I have added the EJB version 3.0 specification to the plugin configuration.
image
image
About the reproducer not showcasing the issue, unfortunately it is all I have now :-( My team is going to profile the affected app to isolate the root cause of the delay; Hopefully I'll be able to use those results to update the reproducer soon and report it here.

On performance issues being only covered under Payara enterprise scope, that is pretty reasonable and we were kind of expecting that already. Still we are grateful for and appreciate any feedback from you, the organization, and all the Payara community members who may encounter this issue on their deployments and would like to contribute to its resolution. Thanks again :-)

@github-actions github-actions bot added Status: Abandoned User has not supplied reproducers for bug report, soon to be closed if user doesn’t come back and removed Type: Bug Label issue as a bug defect Status: Pending Waiting on the issue requester to give more details or share a reproducer labels Feb 19, 2024
Copy link

Greetings,
It's been more than 5 days since we requested more information or an update from you on the details of this issue. Could you provide an update soon, please?
We're afraid that if we do not receive an update, we'll have to close this issue due to inactivity.

Copy link

Greetings,
It's been more than 5 days since this issue was identified as abandoned.
We have closed this issue due to inactivity, please feel free to re-open it if you have more information to share.

@github-actions github-actions bot removed the Status: Abandoned User has not supplied reproducers for bug report, soon to be closed if user doesn’t come back label Feb 25, 2024
@elcodedocle
Copy link
Author

It's been a while but we got around running some profiling and our conclusion is that something related to admin modules must be taking that loading time... and also memory!

This is the AS memory usage on startup on 6.2023.10:
image

This is larger and ever growing on 6.2023.11 and above:

image

This also affects heap memory usage on EAR deployment.

Heap usage during EAR deployment on 6.2023.10:
image

Noticeably larger heap usage during much slower EAR deployment (same EAR) on 6.2023.11 and later (showing latest 2024.3):
image

Also notice this ever growing pattern of memory allocation by some admin module, only present in 2023.11 and above. It eventually crashes our test instances in the cloud, no matter how much memory we give them, by eventually not leaving any free memory required to run some scheduled background processes. This allocated memory does not show in the heap or metadata space; it must be some native code leaking it:
Screenshot 2024-03-16 at 10 49 47

@elcodedocle elcodedocle changed the title Bug Report: Application deployment takes too long (more than twice the time it took on previous releases) Bug Report: Application deployment and AS startup takes more RAM and too long (more than twice the time it took on previous releases) Mar 21, 2024
@elcodedocle elcodedocle changed the title Bug Report: Application deployment and AS startup takes more RAM and too long (more than twice the time it took on previous releases) Bug Report: Application deployment and AS startup takes more RAM (ever growing) and too long (more than twice the time it took on previous releases) Mar 21, 2024
@fabide
Copy link

fabide commented Apr 3, 2024

We are also facing this issue in 6.2024.1
@elcodedocle did you find any solutions/workarounds?

@elcodedocle
Copy link
Author

We are also facing this issue in 6.2024.1 @elcodedocle did you find any solutions/workarounds?

Sorry @fabide , no progress yet on our side. But we haven't forgotten about it: We are implementing better startup checkpoints in our app; hopefully they will give some insight on this matter. I will keep this ticket updated with our findings, if any.

@elcodedocle
Copy link
Author

elcodedocle commented Apr 10, 2024

Actually one finding, I almost forgot: Whatever is causing this slowdown / memory hogging in our app deployment is also present in the latest Payara Enterprise version we got for evaluation.

@lprimak
Copy link
Contributor

lprimak commented Apr 30, 2024

@felixif Please reopen this. The leak is present in current Payara 6 master tree

@elcodedocle
Copy link
Author

elcodedocle commented May 2, 2024

My team was able to verify this issue is caused by this change: 34153c8

@RInverid
Copy link

RInverid commented May 7, 2024

I can confirm that reverting that commit 34153c8 on the 6.2023.11 release and the latest 6.2024.4 release resolves this issue

elcodedocle pushed a commit to elcodedocle/payara-sscce that referenced this issue May 13, 2024
elcodedocle pushed a commit to elcodedocle/payara-sscce that referenced this issue May 14, 2024
@fturizo fturizo added the Status: Open Issue has been triaged by the front-line engineers and is being worked on verification label May 28, 2024
@fturizo fturizo reopened this May 28, 2024
@RInverid
Copy link

Copied from #6731, relevant here:

I did do a git bisect, which resulted in finding the offending commit: 34153c8. I further narrowed it down to this change in Jar entry loading:

- JarEntry entry = zip.getJarEntry(entryName);
+ JarFile jar = new JarFile(res.file, false, JarFile.OPEN_READ,
+                           Runtime.Version.parse(System.getProperty("java.version")));
+ JarEntry entry = jar.getJarEntry(entryName);

adding this change to the 2023.10 release reproduces my issue perfectly.

I tried a lot with this change, including using the new way of loading but changing the verify variable to true and the version hardcoded to 8:

JarFile jar = new JarFile(res.file, true, JarFile.OPEN_READ, Runtime.Version.parse("8"));
JarEntry entry = jar.getJarEntry(entryName);

Before this change, for some libraries such as BouncyCastle which are multirelease and signed libraries, the resulting classData byte[] would be different with the new way of loading. After this change, the resuting classData would be exactly the same, but the issue would still be present.

I could find no other differences in the code so I looked at the differences when doing a JFR recording for the application. The main difference in class calls was a significant increase in the amount of com.sun.enterprise.loader.ASURLClassLoader.loadClassData0(ASURLClassLoader$URLEntry, String) samples (29 in 2023.10, 1026 in 2023.10 with the offending Jar loading change) on the threadadmin-thread-pool::admin-listener.

The back trace of this call (in two screenshots, due to diverging paths):
Screenshot 2024-05-22 at 09 03 15

Screenshot 2024-05-22 at 09 03 37

I am not able to investigate further due to the intimate knowledge of Payara required to continue the investigation. I would appreciate some assistance.

@felixif
Copy link

felixif commented May 29, 2024

Hello @RInverid @lprimak @elcodedocle and @fabide,

First of all apologies for not replying earlier. This issue has slipped through the cracks, being closed due to inactivity. Yesterday I discussed this issue with my team, and currently, there are two internal issues (FISH-8688 and FISH-8672) trying to address a couple of problems that might cause the performance drop in recent Payara builds. The internal issues focus on optimizing the MR-jar loading and the removal of classloader leaks, both being community contributions by @lprimak that are currently reviewed by our Engineering team. Still, they need to be tested and backported, so I am unable to say precisely when the fixes are going to be released

This issue will remain open, and thank you for your reports and help in getting this bug fixed.

Best regards,
Felix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Open Issue has been triaged by the front-line engineers and is being worked on verification
Projects
None yet
6 participants