-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Previously passing configuration times out on GH actions #304
Comments
GraalVM CE builds with the same configuration seem to be working fine (see https://github.com/graalvm/mandrel/runs/3948735963?check_suite_focus=true) so this issue shouldn't have to do with changes in the GH runner. Compiling Quarkus' integration test
Relates to: #288 |
As the above numbers indicate Removing That said, it appears that Mandrel is performing slightly more allocations than GraalVM CE resulting in higher heap usage and thus stressing the GC even more. What's even more interesting is that building GraalVM CE from source without libgraal [1] appears to be performing better than both the GraalVM CE and Mandrel.
[1]
|
FWIW: Building mandrel with labsjdk (to eliminate the chance of this being an artifact of using different base JDKs) and compiling with
|
It could be a symptom of metaspace issues. @zakkak Could you do a run with this patch added to show RSS? This gives us some clue about non heap memory in addition to heap. |
It turns out there is indeed at least one difference causing more heap allocations to happen depending on the underlying JDK. One of them is due to the implementation of
Experimenting further I tried building GraalVM CE without libgraal and without jlinking [1], the results are the following:
Judging by the results it looks like not-jlinking has a big impact in the number of allocations. Looking into why... [1]
|
This doesn't explain why we are seeing this now, though. Does it? |
No, that's true. Since you got me back on track I performed some more tests and eventually re-ran the exact same run that was failing (which led me to opening this issue) and now it passed (see https://github.com/graalvm/mandrel/runs/3968009738?check_suite_focus=true) Additionally, I found out that the same test fails (on GH actions) with GraalVM CE 21.3 as well, so it's not a Mandrel thing after all... (see https://github.com/quarkusio/quarkus/runs/3966293297?check_suite_focus=true)
Unfortunately further testing indicates that this might be a transient error, I have seen the same mandrel version (a3ad15c) build the native image within ~11m, ~17m, and ~25m. I'll try to run more experiments on an idling laptop to see if they are more stable. |
The lack of stability appears to be the same on my idling laptop making this really hard to pin down.
Passing
Some open questions:
|
@zakkak I might be bringing owls to Athens with this "hacker news" style pro tip....but have you disabled turbo boost to get more consistent perf? I guess fiddling around /sys/devices/system/cpu/intel_pstate/no_turbo on my workstation would do. Not sure about your system... |
That's probably worth a try. Although the fact that the big variance is consistent on both my laptop and GH actions reduces my expectations (and is the main reason I didn't bother trying it yet). Note also that even in 21.2 we see excessive GC (which would result in throttling as well), but the variance only appears in 21.3... PS: That's definitely not a "hacker news" style pro tip. I assure you there are plenty of academic papers (not to mention blog posts) out there that report X% of improvement without factoring in DVFS. Thanks for bringing it up :) |
Disable single parsing of compiler graphs till the impact of it on heap usage decreases see oracle/graal#3435 and graalvm/mandrel#304 (comment)
Disable single parsing of compiler graphs till the impact of it on heap usage decreases see oracle/graal#3435 and graalvm/mandrel#304 (comment) quarkusio#19511 didn't take into account runs with constraint memory. Bringing back `-H:-ParseOnce` reduces the heap usage significantly, from 3.9G down to 2.7G in integration-tests/main.
Disable single parsing of compiler graphs till the impact of it on heap usage decreases see oracle/graal#3435 and graalvm/mandrel#304 (comment) quarkusio#19511 didn't take into account runs with constraint memory. Bringing back `-H:-ParseOnce` reduces the heap usage significantly, from 3.9G down to 2.7G in integration-tests/main.
Looking at the GC logs, I observed that we spend a lot of time in Full GCs, which means that the JVM fails to find free space in the young generation and has to perform expensive Full GCs to reclaim memory, e.g.:
Notice how we spend ~30 seconds to reclaim ~250M... As we can see in all 3 GCs we are unable to reclaim memory from the young generation (it has a lot of survivors) so a Full GC is triggered (due to GC ergonomics) in an effort to reclaim memory from the old generation and promote the survivors. At this stage it's clear that we are operating at the limits of the heap capacity and that we are spending unnecessary CPU cycles trying to optimize heap usage due to GC ergonomics. Initially I tried finding a way to disable the GC ergonomics (or at least tune the heuristics) but I was not able to do so. Thanks to @shipilev, who pointed me to
Notice how they are still worse than:
but the overhead is far less significant. Using
Notice how the live data set constantly increases up to ~4.9G and how we still spend a lot of time due to GC ergonomics trying to reclaim memory, but this time in a far lower frequency than before. FTR the different compilation phases have different memory footprints as follows: with
|
Disable single parsing of compiler graphs till the impact of it on heap usage decreases see oracle/graal#3435 and graalvm/mandrel#304 (comment) quarkusio#19511 didn't take into account runs with constraint memory. Bringing back `-H:-ParseOnce` reduces the heap usage significantly, from 3.9G down to 2.7G in integration-tests/main.
Disable single parsing of compiler graphs till the impact of it on heap usage decreases see oracle/graal#3435 and graalvm/mandrel#304 (comment) quarkusio#19511 didn't take into account runs with constraint memory. Bringing back `-H:-ParseOnce` reduces the heap usage significantly, from 3.9G down to 2.7G in integration-tests/main.
I am closing this issue as my current understanding is that we were already stressing the JVM and ce05101 resulted in a small increase in memory usage that ended up consistently causing the CI runs to fail. For the time being we have reverted Quarkus to using I plan to further investigate why |
Disable single parsing of compiler graphs till the impact of it on heap usage decreases see oracle/graal#3435 and graalvm/mandrel#304 (comment) quarkusio#19511 didn't take into account runs with constraint memory. Bringing back `-H:-ParseOnce` reduces the heap usage significantly, from 3.9G down to 2.7G in integration-tests/main. (cherry picked from commit 201b9a6)
Disable single parsing of compiler graphs till the impact of it on heap usage decreases see oracle/graal#3435 and graalvm/mandrel#304 (comment) quarkusio#19511 didn't take into account runs with constraint memory. Bringing back `-H:-ParseOnce` reduces the heap usage significantly, from 3.9G down to 2.7G in integration-tests/main. (cherry picked from commit 201b9a6)
Description
mandrel 21.3-dev appears to be failing the last few days.
The last passing run was https://github.com/graalvm/mandrel/runs/3901801751?check_suite_focus=true with the following configuration:
Re-running the exact same configuration on GH runners results in timeouts or failures (see https://github.com/graalvm/mandrel/runs/3947878999?check_suite_focus=true)
How To Reproduce
Steps to reproduce the behavior:
The text was updated successfully, but these errors were encountered: