Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jdk-client produces high CPU usage in Jenkins agent on IBM Mainframe (z/OS) #801

Open
Crosswind opened this issue Aug 1, 2022 · 5 comments

Comments

@Crosswind
Copy link

Hi Jan et al.,

as suggested on the mailing list I am opening an issue here. I'm sure there will be more information required. Just let me know whenever you need something.

Description of the problem:
We are running Jenkins agents that use the Tyrus WebSocket implementation. I am not a developer of Jenkins but just a user. The agents run on the Mainframe operating system z/OS with a J9 JVM.

java version "1.8.0_321"
Java(TM) SE Runtime Environment (build 8.0.7.6 - pmz6480sr7fp6-20220330_01(SR7 FP6))
IBM J9 VM (build 2.9, JRE 1.8.0 z/OS s390x-64-Bit Compressed References 20220303_23850 (JIT enabled, AOT enabled)
OpenJ9   - 6890476
OMR      - 1e70123
IBM      - 1b0232b)
JCL - 20220120_01 based on Oracle jdk8u321-b07

The agent uses significantly more CPU than it does without WebSockets. The top performing thread shows hat jdk-client is used. We assume that this is either a problem with the implementation of the IBM JVM or the Tyrus library. We have a case open with IBM to analyze this issue. However, they would like to talk to someone from Tyrus. I am going to point them to this issue as well.

J9 vs. OpenJDK:
OpenJDK doesn't exist on z/OS so we have to use J9. IBM is the only vendor that provides a JVM for z/OS. We are not seeing high CPU consumtion when running Jenkins with a non-J9 on x86. If that was a problem this would have probably risen a lot more attention. I haven't been able to test whether a J9 on x86 has the same problems as it does on z/OS.

jdk-client vs. grizzly-client:
I have put some time into trying to build the Jenkins agent with the grizzly-client instead of the jdk-client. I have to admit that I am not too familiar with the code of the agent. I got it to build but it then didn't find some Jakarta/WebSocket classes during runtime. I'll put in some more time to see if I can get around it. Since I don't know a lot about the Jenkins source code I have no idea if the grizzly-client will just work or if the agent code would actually have to be changed.

Simple reproducer:
Not so easy. If I had a simple Java client/server that has a client process that is continuously running and waiting for questions we could see if that produces the same results. It would take me some time to code something like this. By any chance, is there a sample client/server that I could build and try to run?

Tyrus version:
Jenkins has version 1.18 included. They very recently upgraded to 2.1.0. I tried to run both agent versions but cannot see any improvement in CPU consumption.

Cheers - David

@jansupol
Copy link
Contributor

jansupol commented Aug 2, 2022

The Tyrus project contains a couple of sample WebSocket applications that are using the Grizzly Client.

I think we would need to test J9 on a PC and compare it with other JVM to see the consumption. Ideally outside of Jenkins; from the description, it is not quite clear what is the WebSocket functionality for Jenkins. I assume Jenkins sets some threads-related options for the client. It won't be possible to use the Grizzly client for Jenkins.

@Crosswind
Copy link
Author

I'll check out the samples and see what I can come up with. From my point of view it has to be long running task that is communicating back and forth for a period of time.

I will work on the J9 test. Potentially, I can shed some light on this: Jenkins bases their server/agent communication on the jenkinsci/remoting project. Normalls, the two JVMs would communicate over TCP and exchange the information they have. Since this is very firewall unfriendly it was decided at some point to include support for WebSockets which bundles the TCP traffic and sends it over a WebSocket connection. This way it can be handled by proxies, go over TCP/443 and so on. I am not sure yet as two what my test result is but it seems as if it is actually possible to include the Grizzly client in Jenkins agent.
Question for my understanding: If we see an improvement with the Grizzly client does that implicate that there is something wrong with the JDK client? Or does it mean that the Grizzly client should be used instead? I don't know how Jenkins handles such changes but I believe we cannot just open a PR against Jenkins and include the Grizzly client.

@jansupol
Copy link
Contributor

jansupol commented Aug 4, 2022

Yes, the JDK client should better be used. The idea of using Grizzly Client was as a workaround for your project - but it is Jenkins who is using the client, so the Grizzly Client does not help in this case.

Using the JDK client can be done as follows: Instead of generic WebSocket

      WebSocketContainer clientContainer = ContainerProvider.getWebSocketContainer();
      session = clientContainer.connectToServer()

the Tyrus code can be used:

     ClientManager client = ClientManager.createClient(JdkClientContainer.class.getName());
     session = client.connectToServer()

@Crosswind
Copy link
Author

I was finally able to make some progress. From IBM's website I downloaded the matching version of the J9 we run on z/OS. I ran the Jenkins agent on my MacBook and on a Windows computer. Both were using the WebSocket connection but I couldn't see any increased CPU usage. The PID was using around 0,4% of CPU (I only monitored this visually, it was definitely not using a full core).

Regarding the grizzly-client: It took me some time but I was able to replace the jdk-client with the grizzly-client in jenkinsci/remoting/pom.xml and successfully build it:

<dependency>
    <groupId>org.glassfish.tyrus</groupId>
    <artifactId>tyrus-container-grizzly-client</artifactId>
    <version>1.18</version>
    <optional>true</optional>
</dependency>

The interesting part was: When I ran this agent on z/OS it actually reduced CPU usage by a lot. It basically went to normal (comparable to running plain TCP without WebSockets). What does that tell us? Since you mentioned that jdk-client is the preferred client and I doubt Jenkins will be switching to grizzly-client at all, I don't know if I should be too excited about this. Do you need any further information to understand why grizzly-client brings an improvement?

The IBM case made a bit of progress and we got to the point that we know that the operating system (z/OS) is not at fault. The loop is being driven from code that is produced by JIT so a bit more difficult to analyze. I have no idea yet how they will progress in narrowing down the culprit. My feeling is that this has to be inside the JVM and not coming from further up (Java code) but we don't have that certainty yet.

@jansupol
Copy link
Contributor

@Crosswind The jdk-client is likely the preferred client for Jenkins. The jdk-client is used in the WebLogic Server and currently in the latest Helidon, whereas the Grizzly Client is used in the Glassfish/Payara Server. I would not say either of the clients is preferred from the Tyrus perspective.

I am happy to hear the Grizzly Client helped in your case. Still, I am not sure I can do anything from the Tyrus side. Were you able to make any progress reproducing the issue on a PC / any environment accessible to us?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants