`ImageMetricsITCase.verifyImageMetrics` failures in `main` #38919

zakkak · 2024-02-21T10:50:46Z

Describe the bug

jpa-postgressql-withxml fails in ImageMetricsITCase.verifyImageMetrics test

See https://github.com/quarkusio/quarkus/actions/runs/7985178432/job/21803563937#step:16:776

The regression was introduced in #38820

Note: As the changes are in vertx-http this probably affects other tests as well, but the impact doesn't happen to be big enough to trigger a test failure for them.

Expected behavior

The test should pass.

Actual behavior

The test fails with:

org.opentest4j.AssertionFailedError: Expected analysis_results.fields.reflection to be within range [163 +- 3%] but was 168 ==> expected: <true> but was: <false>

How to Reproduce?

./mvnw -Dnative -pl integration-tests/jpa-postgresql-withxml -Dnative.surefire.skip -Dformat.skip -Dno-descriptor-tests clean verify -Dtest-containers -Dstart-containers

Output of `uname -a` or `ver`

No response

Output of `java -version`

No response

Mandrel or GraalVM version (if different from Java)

No response

Quarkus version or git rev

No response

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

No response

Additional information

After #38820 I see the following differences:

diff --git a/old.json b/new.json
index c60df879106..b1c92fa16f2 100644
--- a/old.json
+++ b/new.json
@@ -2,30 +2,30 @@
   "resource_usage": {
     "memory": {
       "system_total": 66506358784,
-      "peak_rss_bytes": 6114426880
+      "peak_rss_bytes": 8380002304
     },
     "garbage_collection": {
-      "count": 119,
-      "total_secs": 3.413,
-      "max_heap": 8176271360
+      "count": 66,
+      "total_secs": 2.625,
+      "max_heap": 28445245440
     },
     "cpu": {
-      "load": 12.15073613200237,
+      "load": 12.75164113785558,
       "parallelism": 32,
       "total_cores": 32
     },
-    "total_secs": 43.448367164000004
+    "total_secs": 43.512096230000004
   },
   "image_details": {
     "code_area": {
-      "bytes": 44577920,
-      "compilation_units": 66680
+      "bytes": 44601456,
+      "compilation_units": 66734
     },
-    "total_bytes": 93915968,
+    "total_bytes": 93952832,
     "image_heap": {
-      "bytes": 48922624,
+      "bytes": 48934912,
       "objects": {
-        "count": 485976
+        "count": 486273
       },
       "resources": {
         "bytes": 1616088,
@@ -47,29 +47,29 @@
   },
   "analysis_results": {
     "types": {
-      "total": 23615,
+      "total": 23629,
       "reflection": 6697,
       "jni": 61,
-      "reachable": 20816
+      "reachable": 20830
     },
     "methods": {
       "foreign_downcalls": -1,
-      "total": 188214,
-      "reflection": 4899,
+      "total": 188269,
+      "reflection": 4900,
       "jni": 55,
-      "reachable": 102637
+      "reachable": 102708
     },
     "classes": {
-      "total": 23615,
+      "total": 23629,
       "reflection": 6697,
       "jni": 61,
-      "reachable": 20816
+      "reachable": 20830
     },
     "fields": {
-      "total": 49625,
-      "reflection": 166,
+      "total": 49643,
+      "reflection": 168,
       "jni": 61,
-      "reachable": 30008
+      "reachable": 30032
     }
   }
 }

@cescoffier do the above diffs looks reasonable according to your changes?

The text was updated successfully, but these errors were encountered:

quarkus-bot · 2024-02-21T10:50:51Z

/cc @Karm (mandrel), @ebullient (metrics), @galderz (mandrel), @jmartisk (metrics)

gsmet · 2024-02-22T11:32:10Z

This doesn't look pretty:

-      "peak_rss_bytes": 6114426880
+      "peak_rss_bytes": 8380002304

but are we sure they are solely due to Clément's PR?

zakkak · 2024-02-22T14:29:42Z

but are we sure they are solely due to Clément's PR?

That was the only difference between the two runs (I tested with 31fcfc1 and fd46a61), so I would say yes. Note however that the peak RSS is going up and down between different builds and it's not related to the PR. I am running on a machine with 62GB of RAM with no constraints on the max heap size, so the GC is free to use as much RAM as it wants according to its heuristics which are not deterministic.

For instance running with 31fcfc1 3 times I get:

6837665792
7617609728
12387741696

(Notice that the last run has almost double the peak RSS of the first run without any chances in the code.)

gsmet · 2024-02-26T16:31:12Z

@zakkak could we either raise the threshold or disable the test for now? It's very annoying to have all our builds failing because of this.

zakkak · 2024-02-26T21:43:11Z

I guess the less evil would be to adjust the thresholds and keep this issue open until we verify the new thresholds are OK. This way we won't risk missing another increase due to other changes.

Note that the best would be to not have merged #38820 in the first place without handling this since the test was already failing in the PR.

cescoffier · 2024-02-27T06:14:53Z

The PR in question does not seem related. I will have another look, but basically it is just tests.

My guess is that the true cause is the hot reload which must keep a few objects in memory, including the content of the certificates. Unfortunately, there is no solution around that.

That being said, this only happens if the certificate hot reload is enabled, which is not the case in this test. But there is a few more static methods and fields that could explain it (even if they should be very small when hot reload is not enabled).

cescoffier · 2024-02-27T07:58:53Z

Ok, to it's a build time RSS increase - not runtime. It makes more sense.

How can I produce the reports from above?
The PR introduces an early usage of the Mutiny API and Vert.x scheduled tasks (which then uses Netty underneath). That may explain it.

cescoffier · 2024-02-27T08:39:48Z

I'm confused - but I may not understand what that test is actually doing.

I get:
ImageMetricsITCase.verifyImageMetrics:15 Expected analysis_results.types.jni to be within range [63 +- 1%] but was 61 ==> expected: but was:

Seems related to JNI? It seems I'm under the previous value (isn't that a good thing?)

cescoffier · 2024-02-27T08:51:34Z

So.... with a more recent version of Graalvm, I got something more expected.

cescoffier · 2024-02-27T09:14:47Z

So, I'm unable to reproduce it locally with:

native-image 21 2023-09-19
GraalVM Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15)
Substrate VM GraalVM CE 21+35.1 (build 21+35, serial gc)

I can try to remove the usage of Mutiny, but it's somewhat weird, as I would need to use CS/CF, which are WAY more error-prone.

zakkak · 2024-02-27T09:15:51Z

How can I produce the reports from above?

They are generated when building the native image in a file called quarkus-integration-test-jpa-postgresql-withxml-999-SNAPSHOT-runner-build-output-stats.json

I used:

./mvnw -Dnative -pl integration-tests/jpa-postgresql-withxml clean package -Dnative.surefire.skip -Dquarkus.native.container-build=true -Dquarkus.native.container-runtime=podman

So.... with a more recent version of Graalvm, I got something more expected.

Please use the jdk-21 mandrel builder image as the thresholds are version specific.

cescoffier · 2024-02-27T09:24:37Z

Ok, I know get the same exception as you do. So, it's not a problem with GraalVM CE, but only mandrel (or the in container build using Mandrel). That will not be easy to understand.

The removal of Mutiny does not help. So, it's something else.

cescoffier · 2024-02-27T09:30:18Z

@zakkak Ok, got the test to pass:

I removed: Mutiny usage (sad), a record (useless anyway), and java stream usage.

cescoffier · 2024-02-27T09:38:12Z

@zakkak See #39026

zakkak · 2024-02-27T10:10:43Z

So, it's not a problem with GraalVM CE, but only mandrel (or the in container build using Mandrel). That will not be easy to understand.

That's because Mandrel uses a different JDK as a base (OpenJDK vs LabsJDK) so there are some more differences in the json, the thresholds are meant to test the Quarkus' default builder image (i.e. Mandrel at the moment) however.

@zakkak See #39026

✅

zakkak added kind/bug Something isn't working area/native-image labels Feb 21, 2024

quarkus-bot bot added area/metrics area/smallrye labels Feb 21, 2024

zakkak mentioned this issue Feb 21, 2024

SmallRye Health: terminate request context properly #38888

Merged

zakkak mentioned this issue Feb 22, 2024

Runtime reinitialize org.postgresql.util.PasswordUtil$SecureRandomHolder #38958

Merged

gsmet mentioned this issue Feb 22, 2024

Agroal - Only generate health checks for JDBC datasources #38959

Merged

cescoffier mentioned this issue Feb 27, 2024

Address increased build time RSS caused by certificate hot reload #39026

Merged

gsmet closed this as completed in #39026 Feb 27, 2024

quarkus-bot bot added this to the 3.9 - main milestone Feb 27, 2024

zakkak mentioned this issue Feb 28, 2024

Fixes ImageMetricsTestCase #39044

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ImageMetricsITCase.verifyImageMetrics` failures in `main` #38919

`ImageMetricsITCase.verifyImageMetrics` failures in `main` #38919

zakkak commented Feb 21, 2024 •

edited

Loading

quarkus-bot bot commented Feb 21, 2024

gsmet commented Feb 22, 2024

zakkak commented Feb 22, 2024

gsmet commented Feb 26, 2024 •

edited

Loading

zakkak commented Feb 26, 2024

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024 •

edited

Loading

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024

zakkak commented Feb 27, 2024

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024

zakkak commented Feb 27, 2024

ImageMetricsITCase.verifyImageMetrics failures in main #38919

ImageMetricsITCase.verifyImageMetrics failures in main #38919

Comments

zakkak commented Feb 21, 2024 • edited Loading

Describe the bug

Expected behavior

Actual behavior

How to Reproduce?

Output of uname -a or ver

Output of java -version

Mandrel or GraalVM version (if different from Java)

Quarkus version or git rev

Build tool (ie. output of mvnw --version or gradlew --version)

Additional information

quarkus-bot bot commented Feb 21, 2024

gsmet commented Feb 22, 2024

zakkak commented Feb 22, 2024

gsmet commented Feb 26, 2024 • edited Loading

zakkak commented Feb 26, 2024

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024 • edited Loading

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024

zakkak commented Feb 27, 2024

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024

cescoffier commented Feb 27, 2024

zakkak commented Feb 27, 2024

`ImageMetricsITCase.verifyImageMetrics` failures in `main` #38919

`ImageMetricsITCase.verifyImageMetrics` failures in `main` #38919

zakkak commented Feb 21, 2024 •

edited

Loading

Output of `uname -a` or `ver`

Output of `java -version`

Build tool (ie. output of `mvnw --version` or `gradlew --version`)

gsmet commented Feb 26, 2024 •

edited

Loading

cescoffier commented Feb 27, 2024 •

edited

Loading