Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-51307] Unable to collect GC data with NotificationEmitter in native build #8237

Open
viniciusxyz opened this issue Jan 24, 2024 · 26 comments
Assignees

Comments

@viniciusxyz
Copy link

viniciusxyz commented Jan 24, 2024

A favor I ask of the team

This is actually the second time I have opened this issue, the first was #7803 opened on November 11, 2023,
but it was closed maybe because I couldn't explain it clearly, I ask that you please be aware that this issue does not refer to compilation problems and yes, the lack of data in the NotificationEmitter at runtime to collect GC metrics, this occurs in both G1GC and SerialGC in both GraalVM CE and Oracle GraalVM

Describe the problem

I'm developing applications and I'm missing some statistics when using Micrometer, being more specific the details and duration of GC pauses, I started validating the lib code and noticed that the problem is actually that it seems that NotificationEmitter is not working launching the notification events and this only happens in the native image, these metrics are very important for us to be able to put the applications into production, so I would like help to solve this problem.

Steps to reproduce the issue

  1. git clone --depth 1 https://github.com/viniciusxyz/graalvm-issue-7803-notification-emmiter.git
  2. Build app mvn clean package -Pnative
  3. Run app ./target/main-notification-emitter

When running the native image, only the notifier addition log will be displayed, and when running with hotspot, when System.gc() is called, a log is displayed from the emission of the NotificationEmitter event

Describe GraalVM and its environment:

  • GraalVM version: GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15)
  • JDK major version: 21
  • OS: Windows 11
  • Architecture: AMD64

More details

Print execution with hotspot:

image

Print from execution with native compilation:

image

All compilation configuration is in the pom file of the project passed in the example

To prove that the problem does not depend on whether it is graalvm ce or oracle graalvm, the print follows with the same behavior:

image

Currently, as far as I know, there are two main ways to expose information about the garbage collector runtimes while the application is running so that we can view it continuously when we are in Kubernetes, the first is through a javaagent that exports this information to any provider such as Prometheus and another is by adding some lib that sends these metrics to one of these providers, but as far as I've seen both forms depend on notificationEmmiter for updates related to these metrics, without this improvement several applications that monitor the GC via Prometheus + micrometer for example will be left without the data for monitoring related to GC times.

Demo of information visualization in grafana

Without native compilation

image

With native compilation

image


@wirthi @kassifar I reopened the issue, if any details were not clear please let me know

@wirthi
Copy link
Member

wirthi commented Jan 24, 2024

Hi @viniciusxyz

as already mentioned in the previous ticket,

there are some open issues around the management interfaces where we still need some work.

We are aware of that, but this is currently no priority for us to fix.

@fniephaus
Copy link
Member

Maybe this is something @roberttoyonaga likes to look into?

@TheTaha-Alamine TheTaha-Alamine self-assigned this Jan 24, 2024
@wirthi wirthi changed the title Unable to collect GC data with NotificationEmitter in native build [GR-51307] Unable to collect GC data with NotificationEmitter in native build Jan 25, 2024
@roberttoyonaga
Copy link
Collaborator

Maybe this is something @roberttoyonaga likes to look into?

Hi @fniephaus. I'm happy to look into this eventually, if nobody else picks this up. However, it probably wont be on my to-do list for some time.

@fniephaus
Copy link
Member

I'm happy to look into this eventually, if nobody else picks this up

Cool, thanks!

However, it probably wont be on my to-do list for some time.

Typo? 😆 We're not going to complain if it's done asap

@roberttoyonaga
Copy link
Collaborator

Typo? 😆 We're not going to complain if it's done asap

oops 😆 I mean it will probably remain on my to-do list for a while

@viniciusxyz
Copy link
Author

Hi @viniciusxyz

as already mentioned in the previous ticket,

there are some open issues around the management interfaces where we still need some work.

We are aware of that, but this is currently no priority for us to fix.

I understand this point perfectly, I just found it strange to mark it as complete, even though it is a backlog item I believe it should be maintained because problems with GC happen quite frequently and losing traceability on this is a current negative point of native compilation, but I completely understand not be an immediate priority

@codesimplicity
Copy link

codesimplicity commented May 17, 2024

🙋 we're also interested in having that added.

In the meantime, did anybody already reach out to Micrometer?
It looks like this can be detected: the current GraalVM NotificationEmitter does not support any notification types, especially not the one Micrometer filters for. They could detect and log that as they already do if ManagementFactory.getMemoryPoolMXBeans() returns an empty list.

Update: I've now created a PR to detect and log that: micrometer-metrics/micrometer#5149

@viniciusxyz
Copy link
Author

🙋 we're also interested in having that added.

In the meantime, did anybody already reach out to Micrometer? It looks like this can be detected: the current GraalVM NotificationEmitter does not support any notification types, especially not the one Micrometer filters for. They could detect and log that as they already do if ManagementFactory.getMemoryPoolMXBeans() returns an empty list.

Very interesting ! I hadn't paid attention to this detail in Micrometer, I'm going to download the code and do some tests, but for my part I didn't know there was this other way so I didn't open an issue there

@codesimplicity
Copy link

codesimplicity commented May 17, 2024

Very interesting ! I hadn't paid attention to this detail in Micrometer, I'm going to download the code and do some tests, but for my part I didn't know there was this other way so I didn't open an issue there

Well, it will not really help with the issue that there will be no metrics. But you'd at least get a log message why.
I was specifically referring to this code in JvmGcMetrics.

@viniciusxyz
Copy link
Author

viniciusxyz commented May 17, 2024

Very interesting ! I hadn't paid attention to this detail in Micrometer, I'm going to download the code and do some tests, but for my part I didn't know there was this other way so I didn't open an issue there

Well, it will not really help with the issue that there will be no metrics. But you'd at least get a log message why. I was specifically referring to this code in JvmGcMetrics.

I understood. I thought there was some fallback, but if there really isn't, then calling the micrometer team probably won't help much since the problem is the lack of notifications in the native compilation.

It's quite frustrating to have to make monitoring worse to take advantage of the benefits of native images, but this is a necessary choice at the moment as far as I understand.

@roberttoyonaga
Copy link
Collaborator

roberttoyonaga commented Oct 2, 2024

Hi @viniciusxyz and @codesimplicity ! I've submitted a PR adding support for GC notifications (only for serial GC for now). If you have time, please let me know if this solution works for your use case. Thanks!
#9799

@viniciusxyz
Copy link
Author

@roberttoyonaga As soon as possible I will try to understand how the process of compiling graalvm works locally to test its implementation and I will report back. I really appreciate the work.

@roberttoyonaga
Copy link
Collaborator

@roberttoyonaga As soon as possible I will try to understand how the process of compiling graalvm works locally to test its implementation and I will report back. I really appreciate the work.

No problem! Here are some brief steps to get you started with building GraalVM:

You'll need mx. And the latest labsjdk release.

Put mx on the path and set java home to labsjdk:
export PATH=/path/to/mx:$PATH
export JAVA_HOME=/path/to/labsjdk
Navigate to the vm/README.md directory for instructions on how to use mx build command to build GraalVM with the substratevm component.

Once GraalVM is built, you can use mx native-image to create native image executables or just find the native-image binary in the build directory. You'll also be able to find a link to the latest build in graal/vm/latest_graalvm_home

The first few minutes of this video does a good job of explaining this as well: https://youtu.be/3Gh0cz3vjG8?feature=shared&t=202

@viniciusxyz
Copy link
Author

@roberttoyonaga Unfortunately I had some problems getting this to work on Windows, I'll try again tomorrow using Linux

@viniciusxyz
Copy link
Author

viniciusxyz commented Oct 3, 2024

@roberttoyonaga I'm having some problems getting the test done, I can compile the graal using mx --env ce build, but from what I understand from the documentation it doesn't have the native-image tool and the mx native-image apparently does not exist, I appealed and tried to run a mx build --all, but I still have the same problem, can you please guide me on how to create the native-image binary?

image

@roberttoyonaga
Copy link
Collaborator

roberttoyonaga commented Oct 3, 2024

@viniciusxyz maybe try mx --dynamicimports /substratevm build from the vm directory. And then try mx build from the substratevm directory.

@viniciusxyz
Copy link
Author

@roberttoyonaga By following the steps I managed to compile it, but the result was the same as opening the issue, I'm using the right branch as you can see below:

image

But the result of the compilation is still using scavenger, probably I must still be doing something wrong, but I don't know what...

image

The steps I did were:

  1. Adjust the mx and labsjdk envs as I downloaded them with the commands below:
export PATH=/home/vvsantos/mx:$PATH
export JAVA_HOME=/home/vvsantos/labsjdk-ce-24-jvmci-b01-debug
  1. Enter the correct graal/vm directory and compile with the command below:
mx --dynamicimports /substratevm build
  1. Go to the substratevm directory and compile with the command below
cd ../substratevm 
mx build
  1. Copy my class to this directory and compile it
javac MainNotificationEmitter.java
mx native-image MainNotificationEmitter
  1. Execute
./mainnotificationemitter

The class used for testing was this

import com.sun.management.GarbageCollectionNotificationInfo;

import javax.management.NotificationEmitter;
import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;

public class MainNotificationEmitter {

    public static void main(String[] args) throws InterruptedException {
        for (GarbageCollectorMXBean gcBean : ManagementFactory.getGarbageCollectorMXBeans()) {
            if (!(gcBean instanceof NotificationEmitter notificationEmitter)) {
                continue;
            }
            System.out.println("NotificationEmitter " + gcBean.getName());
            notificationEmitter.addNotificationListener((notification, handback) -> {

                var type = notification.getType();

                System.out.printf("Notification  issued -> %s\n", type);
            }, notification -> notification.getType()
                    .equals(GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION), null);
        }
        System.out.println("Notifier added");
        System.gc();
        Thread.sleep(3_000);
        System.out.println("End");
    }
}

Can you identify if I'm doing something wrong in this flow?

@roberttoyonaga
Copy link
Collaborator

roberttoyonaga commented Oct 3, 2024

Hi @viniciusxyz you need to use the build options shown in the PR description when you execute the Native Image build.

In short, you need to use mx native-image --enable-monitoring=gcnotif MainNotificationEmitter. Everything else you are doing looks correct.

As a side note: To use any of the monitoring/serviceability features in Native Image, you need to pass --enable-monitoring at build time. For example, --enable-monitoring=jfr,nmt,heapdump,gcnotif or --enable-monitoring==all.

If you have any other questions, please don't hesitate to ask!

@viniciusxyz
Copy link
Author

@roberttoyonaga I read this yesterday and I had already forgotten about it today :) I managed to test it, I just found it strange that there was an extra notification after the main code had finished executing

image

When we build with mx, is the native-image binary available in the labsjdk bin? I ask because I would like to do a complete test by compiling an application using maven and reachability-metadata (I need it to compile some things) and uploading it to Kubernetes to validate if the micrometer metrics are working correctly with this implementation

@roberttoyonaga
Copy link
Collaborator

roberttoyonaga commented Oct 3, 2024

I just found it strange that there was an extra notification after the main code had finished executing

Oh yes, that's a small bug. When the system gets teared-down in emits another notification. I'll fix that and push an update.

When we build with mx, is the native-image binary available in the labsjdk bin?

No its not in labsjdk. You can find the latest build in the graal repository here: graal/vm/latest_graalvm_home
The latest native-image tool is here: graal/vm/latest_graalvm_home/bin/native-image

@viniciusxyz
Copy link
Author

@roberttoyonaga I started the integrated tests, basically I have an application that uses micronaut in conjunction with micrometer that uses jvm metrics to launch metrics for prometheus, initially notificationEmmiter seems to be working well since now I can see the pauses associated with the GC, strangely the metrics related to HEAP memory broke, but perhaps it has nothing to do with the change you made, another point that I found strange was the GC pauses that only increased more and more, I will start the application with the GC monitoring flags and compare the data to validate whether they are correct, I will give you a more complete feedback as soon as possible. Below is the image with the GC time metric that is super important for us and that now exists with its change :)

image

@roberttoyonaga
Copy link
Collaborator

Hi @viniciusxyz

strangely the metrics related to HEAP memory broke

What do you mean by this? Did the notifications stop containing the memory usage info? Or did the memory usages reported look strange?

I made a very simple test app here: https://gist.github.com/roberttoyonaga/44bfe8cbaa809c102fa5d9cc959f2997
Using it, it looks like the memory usages reported are normal. Also I don't see that the GC durations are increasing.

@viniciusxyz
Copy link
Author

viniciusxyz commented Oct 4, 2024

@roberttoyonaga I don't know if it was your change, but there seems to have been a change in behavior in the memory metrics related to MemoryPoolMXBean, the screenshot below shows the behavior in the jdk 21 lts that I generally use:

image

Jdk 24 with your PR:

image

Code for test:

https://gist.github.com/viniciusxyz/c0883ec5a499351225bebf9a8ccbc5ea

What I'm doing for now is using memory metrics that are provided by micrometer so I'm testing the functioning of the resources necessary for these 3 classes to work:

JvmHeapPressureMetrics
JvmGcMetrics
JvmMemoryMetrics

As for measuring GC times, everything really seems to be ok, I made a comparison using -XX:+PrintGC and there doesn't seem to be anything abnormal

@viniciusxyz
Copy link
Author

@roberttoyonaga I did a new test and it seems that this metric has disappeared and has no relation to your commit, checking out the previous commit and redoing the build I noticed the same problem, now why does this work in Java 21, but in Java 24 it is like this I don't know. As for the GC metrics themselves, everything seems to be correct.

Test with commit a8efbf7c2981

image

@roberttoyonaga
Copy link
Collaborator

Hi @viniciusxyz this is actually expected. Please see the comment here.

The max sizes are always -1 in JDK 24 becuase of this PR #6930
Essentially, the GC only begins collections after the max size has been exceeded. So it's not safe to rely on the max sizes never being exceeded. That's why the max sizes are now always -1.

@viniciusxyz
Copy link
Author

@roberttoyonaga I understand, there are some graphs we use that use this maximum to be able to demonstrate in percentage terms the amount of heap used at the moment, but this can be removed without any problems, thank you for the clarification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants