Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Previously passing configuration times out on GH actions #304

Closed
zakkak opened this issue Oct 20, 2021 · 13 comments
Closed

Previously passing configuration times out on GH actions #304

zakkak opened this issue Oct 20, 2021 · 13 comments
Assignees
Labels
affects/21.3 bug Something isn't working testing CI and testing related issues and PRs

Comments

@zakkak
Copy link
Collaborator

zakkak commented Oct 20, 2021

Description

mandrel 21.3-dev appears to be failing the last few days.

The last passing run was https://github.com/graalvm/mandrel/runs/3901801751?check_suite_focus=true with the following configuration:

component version
mandrel 2d827b9
mandrel-packaging 9bbf83a0a813c778762279faa0460661810ec554
quarkus a2c81efd6607d270dc7eaa9b7c38acb849c85030
jdk 11.0.13-ea+7
mx 5.309.2

Re-running the exact same configuration on GH runners results in timeouts or failures (see https://github.com/graalvm/mandrel/runs/3947878999?check_suite_focus=true)

How To Reproduce

Steps to reproduce the behavior:

gh workflow run base.yml \
  -f quarkus-version=a2c81efd6607d270dc7eaa9b7c38acb849c85030 \
  -f name="rerun last working config for 21.3" \
  -f jdk=ea \
  -f version=2d827b9970ac5e45dc1bb3e997aea4e3333d6939 \
  -f mandrel-packaging-version=9bbf83a0a813c778762279faa0460661810ec554  
@zakkak zakkak added the bug Something isn't working label Oct 20, 2021
@zakkak zakkak self-assigned this Oct 20, 2021
@zakkak zakkak added the testing CI and testing related issues and PRs label Oct 20, 2021
@zakkak
Copy link
Collaborator Author

zakkak commented Oct 20, 2021

GraalVM CE builds with the same configuration seem to be working fine (see https://github.com/graalvm/mandrel/runs/3948735963?check_suite_focus=true) so this issue shouldn't have to do with changes in the GH runner.

Compiling Quarkus' integration test main (with -Dquarkus.native.native-image-xmx=5g) also results in the following:

GraalVM CE 21.3 Mandrel 21.3-dev
Comp time ~16.5m ~24.5m
# of GCs 275 301
# of Full GCs 125 153
Time spend in Full GCs ~11.5m ~19.5m

Relates to: #288

@zakkak
Copy link
Collaborator Author

zakkak commented Oct 20, 2021

As the above numbers indicate main results in both Mandrel and GraalVM CE allocating enough data to make the heap usage reach close to the heap capacity and thus in excessive number of GCs

Removing -Dquarkus.native.native-image-xmx=5g the performance is much better and the number of Full GCs falls to only 11 (for both Mandrel and GraalVM CE).

That said, it appears that Mandrel is performing slightly more allocations than GraalVM CE resulting in higher heap usage and thus stressing the GC even more.

What's even more interesting is that building GraalVM CE from source without libgraal [1] appears to be performing better than both the GraalVM CE and Mandrel.

GraalVM CE 21.3 Mandrel 21.3-dev GraalVM CE nolibgraal
Comp time ~16.5m ~24.5m ~15m
# of GCs 275 301 245
# of Full GCs 125 153 82
Time spend in Full GCs ~11.5m ~19.5m ~9.8m

[1]

mx --primary-suite-path substratevm --components='Native Image' --native-images=native-image build

@zakkak
Copy link
Collaborator Author

zakkak commented Oct 20, 2021

FWIW: Building mandrel with labsjdk (to eliminate the chance of this being an artifact of using different base JDKs) and compiling with -Dquarkus.native.native-image-xmx=5g results in:

Fatal error:java.lang.OutOfMemoryError: GC overhead limit exceeded

@jerboaa
Copy link
Collaborator

jerboaa commented Oct 21, 2021

FWIW: Building mandrel with labsjdk (to eliminate the chance of this being an artifact of using different base JDKs) and compiling with -Dquarkus.native.native-image-xmx=5g results in:

Fatal error:java.lang.OutOfMemoryError: GC overhead limit exceeded

It could be a symptom of metaspace issues.

@zakkak Could you do a run with this patch added to show RSS? This gives us some clue about non heap memory in addition to heap.

@zakkak
Copy link
Collaborator Author

zakkak commented Oct 21, 2021

(to eliminate the chance of this being an artifact of using different base JDKs)

It turns out there is indeed at least one difference causing more heap allocations to happen depending on the underlying JDK.

One of them is due to the implementation of HotSpotResolvedJavaFieldImpl.getAnnotation which in the upstream OpenJDK 11 results in the field being copied while in labsjdk it looks like no heap allocations are happening (probably due to caching).

HotSpotResolvedJavaFieldImpl.getAnnotation is invoked from AnalysisConstantReflectionProvider.filterInjectedAccessor

Experimenting further I tried building GraalVM CE without libgraal and without jlinking [1], the results are the following:

GraalVM CE 21.3 Mandrel 21.3-dev GraalVM CE nolibgraal GraalVM CE nolibgraal nojlinking
Comp time ~16.5m ~24.5m ~15m ~18.5m
# of GCs 275 301 245 285
# of Full GCs 125 153 82 113
Time spend in Full GCs ~11.5m ~19.5m ~9.8m ~13.2m

Judging by the results it looks like not-jlinking has a big impact in the number of allocations. Looking into why...From my so far understanding this is due to AbstractNativeImageClassLoaderSupport.handleClassFileName being invoked more times and resulting in more allocations.

[1]

mx --primary-suite-path substratevm --no-jlinking --components='Native Image' build

@jerboaa
Copy link
Collaborator

jerboaa commented Oct 21, 2021

Experimenting further I tried building GraalVM CE without libgraal and without jlinking [1], the results are the following:
[...]

Judging by the results it looks like not-jlinking has a big impact in the number of allocations.

This doesn't explain why we are seeing this now, though. Does it?

@zakkak
Copy link
Collaborator Author

zakkak commented Oct 22, 2021

This doesn't explain why we are seeing this now, though. Does it?

No, that's true.

Since you got me back on track I performed some more tests and eventually re-ran the exact same run that was failing (which led me to opening this issue) and now it passed (see https://github.com/graalvm/mandrel/runs/3968009738?check_suite_focus=true)

Additionally, I found out that the same test fails (on GH actions) with GraalVM CE 21.3 as well, so it's not a Mandrel thing after all... (see https://github.com/quarkusio/quarkus/runs/3966293297?check_suite_focus=true)

graal/master (i.e. latest graal) seems to work fine for both Mandrel and GraalVM CE builds (see https://github.com/graalvm/mandrel/runs/3962138022?check_suite_focus=true and https://github.com/graalvm/mandrel/runs/3959205365?check_suite_focus=true)

Unfortunately further testing indicates that this might be a transient error, I have seen the same mandrel version (a3ad15c) build the native image within ~11m, ~17m, and ~25m.

I'll try to run more experiments on an idling laptop to see if they are more stable.

@zakkak
Copy link
Collaborator Author

zakkak commented Oct 26, 2021

I'll try to run more experiments on an idling laptop to see if they are more stable.

The lack of stability appears to be the same on my idling laptop making this really hard to pin down.

logs/mandrel-21.2.0.2-Final-main-1:[INFO] Total time:  18:14 min
logs/mandrel-21.2.0.2-Final-main-2:[INFO] Total time:  17:48 min
logs/mandrel-21.2.0.2-Final-main-3:[INFO] Total time:  18:31 min
logs/mandrel-21.2.0.2-Final-main-4:[INFO] Total time:  17:58 min
logs/mandrel-21.2.0.2-Final-main-5:[INFO] Total time:  18:10 min

logs/mandrel-21.3.0.0-Final-main-1.txt:[INFO] Total time:  18:22 min
logs/mandrel-21.3.0.0-Final-main-2.txt:[INFO] Total time:  17:39 min
logs/mandrel-21.3.0.0-Final-main-3.txt:[INFO] Total time:  29:56 min
logs/mandrel-21.3.0.0-Final-main-4.txt:[INFO] Total time:  19:35 min
logs/mandrel-21.3.0.0-Final-main-5.txt:[INFO] Total time:  33:06 min

Passing -H:-ParseOnce seems to:

  1. Improve compilation time by ~3x (In contrast to what I wrongfully observed when testing without constraining the memory usage on a9a2f40)
  2. Increase stability
  3. Decrease the maximum live data set (as observed by the gc logs) from 3.9G to 2.7G. The live data increase in the "universe" phase.
logs/mandrel-21.2.0.2-Final-main-noParseOnce-1.txt:[INFO] Total time:  09:59 min
logs/mandrel-21.2.0.2-Final-main-noParseOnce-2.txt:[INFO] Total time:  04:45 min
logs/mandrel-21.2.0.2-Final-main-noParseOnce-3.txt:[INFO] Total time:  04:45 min
logs/mandrel-21.2.0.2-Final-main-noParseOnce-4.txt:[INFO] Total time:  04:48 min
logs/mandrel-21.2.0.2-Final-main-noParseOnce-5.txt:[INFO] Total time:  04:45 min

logs/mandrel-21.3.0.0-Final-main-noParseOnce-1.txt:[INFO] Total time:  05:54 min
logs/mandrel-21.3.0.0-Final-main-noParseOnce-2.txt:[INFO] Total time:  05:42 min
logs/mandrel-21.3.0.0-Final-main-noParseOnce-3.txt:[INFO] Total time:  05:36 min
logs/mandrel-21.3.0.0-Final-main-noParseOnce-4.txt:[INFO] Total time:  05:38 min
logs/mandrel-21.3.0.0-Final-main-noParseOnce-5.txt:[INFO] Total time:  05:28 min

Some open questions:

  1. How did this not pop up earlier in our CI? Quarkus main has not been using -H:-ParseOnce since 0be0406

  2. What causes the large variance in compilation times?

  3. Why does this only affect integration-tests/main to that degree?

@Karm
Copy link
Collaborator

Karm commented Oct 26, 2021

@zakkak I might be bringing owls to Athens with this "hacker news" style pro tip....but have you disabled turbo boost to get more consistent perf? I guess fiddling around /sys/devices/system/cpu/intel_pstate/no_turbo on my workstation would do. Not sure about your system...

@zakkak
Copy link
Collaborator Author

zakkak commented Oct 26, 2021

@zakkak I might be bringing owls to Athens with this "hacker news" style pro tip....but have you disabled turbo boost to get more consistent perf? I guess fiddling around /sys/devices/system/cpu/intel_pstate/no_turbo on my workstation would do. Not sure about your system...

That's probably worth a try. Although the fact that the big variance is consistent on both my laptop and GH actions reduces my expectations (and is the main reason I didn't bother trying it yet). Note also that even in 21.2 we see excessive GC (which would result in throttling as well), but the variance only appears in 21.3...

PS: That's definitely not a "hacker news" style pro tip. I assure you there are plenty of academic papers (not to mention blog posts) out there that report X% of improvement without factoring in DVFS. Thanks for bringing it up :)

zakkak added a commit to zakkak/quarkus that referenced this issue Oct 26, 2021
Disable single parsing of compiler graphs till the impact of it on heap
usage decreases see oracle/graal#3435 and
graalvm/mandrel#304 (comment)
zakkak added a commit to zakkak/quarkus that referenced this issue Oct 26, 2021
Disable single parsing of compiler graphs till the impact of it on heap
usage decreases see oracle/graal#3435 and
graalvm/mandrel#304 (comment)

quarkusio#19511 didn't take into account
runs with constraint memory.  Bringing back `-H:-ParseOnce` reduces the
heap usage significantly, from 3.9G down to 2.7G in
integration-tests/main.
zakkak added a commit to zakkak/quarkus that referenced this issue Oct 26, 2021
Disable single parsing of compiler graphs till the impact of it on heap
usage decreases see oracle/graal#3435 and
graalvm/mandrel#304 (comment)

quarkusio#19511 didn't take into account
runs with constraint memory.  Bringing back `-H:-ParseOnce` reduces the
heap usage significantly, from 3.9G down to 2.7G in
integration-tests/main.
@zakkak
Copy link
Collaborator Author

zakkak commented Oct 27, 2021

Looking at the GC logs, I observed that we spend a lot of time in Full GCs, which means that the JVM fails to find free space in the young generation and has to perform expensive Full GCs to reclaim memory, e.g.:

--
[949.815s][info][gc,heap        ] GC(167) PSYoungGen: 840192K->797743K(1293824K)
[949.815s][info][gc,heap        ] GC(167) ParOldGen: 3495027K->3495275K(3495424K)
[949.815s][info][gc,metaspace   ] GC(167) Metaspace: 495297K->495297K(1468416K)
[949.815s][info][gc             ] GC(167) Pause Full (Ergonomics) 4233M->4192M(4677M) 11731.041ms
--
[955.809s][info][gc,heap        ] GC(168) PSYoungGen: 840192K->736113K(1293824K)
[955.809s][info][gc,heap        ] GC(168) ParOldGen: 3495275K->3495150K(3495424K)
[955.809s][info][gc,metaspace   ] GC(168) Metaspace: 495297K->495297K(1468416K)
[955.809s][info][gc             ] GC(168) Pause Full (Ergonomics) 4233M->4132M(4677M) 5785.816ms
--
[968.554s][info][gc,heap        ] GC(169) PSYoungGen: 840192K->740403K(1293824K)
[968.554s][info][gc,heap        ] GC(169) ParOldGen: 3495150K->3495217K(3495424K)
[968.554s][info][gc,metaspace   ] GC(169) Metaspace: 495300K->495300K(1468416K)
[968.554s][info][gc             ] GC(169) Pause Full (Ergonomics) 4233M->4136M(4677M) 12270.639ms
[968.554s][info][gc,cpu         ] GC(169) User=92.73s Sys=0.07s Real=12.27s
[quarkus-integration-test-main-999-SNAPSHOT-runner:660320]     universe: 304,192.26 ms,  4.57 GB

Notice how we spend ~30 seconds to reclaim ~250M... As we can see in all 3 GCs we are unable to reclaim memory from the young generation (it has a lot of survivors) so a Full GC is triggered (due to GC ergonomics) in an effort to reclaim memory from the old generation and promote the survivors. At this stage it's clear that we are operating at the limits of the heap capacity and that we are spending unnecessary CPU cycles trying to optimize heap usage due to GC ergonomics. Initially I tried finding a way to disable the GC ergonomics (or at least tune the heuristics) but I was not able to do so. Thanks to @shipilev, who pointed me to UseAdaptiveGCBoundary as an alternative to try out, I was able to get the following results:

logs/mandrel-21.3.0.0-Final-main-UseAdaptiveGCBoundary-1.txt:[INFO] Total time:  07:05 min
logs/mandrel-21.3.0.0-Final-main-UseAdaptiveGCBoundary-2.txt:[INFO] Total time:  07:59 min
logs/mandrel-21.3.0.0-Final-main-UseAdaptiveGCBoundary-3.txt:[INFO] Total time:  07:38 min
logs/mandrel-21.3.0.0-Final-main-UseAdaptiveGCBoundary-4.txt:[INFO] Total time:  08:07 min

Notice how they are still worse than:

logs/mandrel-21.3.0.0-Final-main-noParseOnce-5.txt:[INFO] Total time:  05:28 min

but the overhead is far less significant.

Using UseAdaptiveGCBoundary we see the following:

[182.670s][info][gc,heap        ] GC(71) PSYoungGen: 44523K->0K(1014272K)
[182.670s][info][gc,heap        ] GC(71) ParOldGen: 3760404K->3774730K(4096000K)
[182.670s][info][gc,metaspace   ] GC(71) Metaspace: 490450K->490450K(1464320K)
[182.670s][info][gc             ] GC(71) Pause Full (Ergonomics) 3715M->3686M(4990M) 5603.479ms
--
[193.125s][info][gc,heap        ] GC(75) PSYoungGen: 44513K->0K(891392K)
[193.125s][info][gc,heap        ] GC(75) ParOldGen: 3914472K->3936429K(4218880K)
[193.125s][info][gc,metaspace   ] GC(75) Metaspace: 492746K->492746K(1466368K)
[193.125s][info][gc             ] GC(75) Pause Full (Ergonomics) 3866M->3844M(4990M) 4521.975ms
--
[212.629s][info][gc,heap        ] GC(81) PSYoungGen: 44539K->0K(807936K)
[212.629s][info][gc,heap        ] GC(81) ParOldGen: 4054251K->4057183K(4302336K)
[212.629s][info][gc,metaspace   ] GC(81) Metaspace: 493543K->493543K(1466368K)
[212.629s][info][gc             ] GC(81) Pause Full (Ergonomics) 4002M->3962M(4990M) 5725.366ms
--
[229.383s][info][gc,metaspace   ] GC(88) Metaspace: 493956K->493956K(1466368K)
[229.383s][info][gc             ] GC(88) Pause Young (Allocation Failure) 4877M->4045M(4990M) 90.973ms
[229.383s][info][gc,cpu         ] GC(88) User=0.59s Sys=0.00s Real=0.09s
[quarkus-integration-test-main-999-SNAPSHOT-runner:316334]     (clinit):   1,779.54 ms,  4.87 GB

Notice how the live data set constantly increases up to ~4.9G and how we still spend a lot of time due to GC ergonomics trying to reclaim memory, but this time in a far lower frequency than before.

FTR the different compilation phases have different memory footprints as follows:

with +ParseOnce

  • classlist phase has a live data set of <300MB
  • clinit phase has a live data set of 3.9G - 4G which is kept alive until the universe phase is complete
  • parse phase has a live data set of 3.3G
  • inline phase has a live data set of 2.5G
  • compile phase has a live data set of 2.9G
  • image phase has a live data set of 1.5G

with -ParseOnce

  • classlist phase has a live data set of <300MB
  • clinit phase has a live data set of ~2.1G which is kept alive until the universe phase is complete
  • parse phase has a live data set of 2.2G
  • inline phase has a live data set of 3G
  • compile phase has a live data set of 3.2G
  • image phase has a live data set of 1.7G

zakkak added a commit to zakkak/quarkus that referenced this issue Oct 27, 2021
Disable single parsing of compiler graphs till the impact of it on heap
usage decreases see oracle/graal#3435 and
graalvm/mandrel#304 (comment)

quarkusio#19511 didn't take into account
runs with constraint memory.  Bringing back `-H:-ParseOnce` reduces the
heap usage significantly, from 3.9G down to 2.7G in
integration-tests/main.
zakkak added a commit to zakkak/quarkus that referenced this issue Oct 28, 2021
Disable single parsing of compiler graphs till the impact of it on heap
usage decreases see oracle/graal#3435 and
graalvm/mandrel#304 (comment)

quarkusio#19511 didn't take into account
runs with constraint memory.  Bringing back `-H:-ParseOnce` reduces the
heap usage significantly, from 3.9G down to 2.7G in
integration-tests/main.
@zakkak
Copy link
Collaborator Author

zakkak commented Nov 2, 2021

FTR the commit that increased the memory usage and triggered the CI failures appears to be ce05101

It's also worth noting that trying -H:-IncludeMethodData (option added in ce05101) locally didn't have a big impact.

@zakkak
Copy link
Collaborator Author

zakkak commented Nov 4, 2021

I am closing this issue as my current understanding is that we were already stressing the JVM and ce05101 resulted in a small increase in memory usage that ended up consistently causing the CI runs to fail.

For the time being we have reverted Quarkus to using -H:-ParseOnce (see quarkusio/quarkus#21018) to avoid the issue.

I plan to further investigate why -H:+ParseOnce has such a big impact in the case of Quarkus' integration-tests/main, see #316.

@zakkak zakkak closed this as completed Nov 4, 2021
gsmet pushed a commit to gsmet/quarkus that referenced this issue Dec 1, 2021
Disable single parsing of compiler graphs till the impact of it on heap
usage decreases see oracle/graal#3435 and
graalvm/mandrel#304 (comment)

quarkusio#19511 didn't take into account
runs with constraint memory.  Bringing back `-H:-ParseOnce` reduces the
heap usage significantly, from 3.9G down to 2.7G in
integration-tests/main.

(cherry picked from commit 201b9a6)
gsmet pushed a commit to gsmet/quarkus that referenced this issue Dec 1, 2021
Disable single parsing of compiler graphs till the impact of it on heap
usage decreases see oracle/graal#3435 and
graalvm/mandrel#304 (comment)

quarkusio#19511 didn't take into account
runs with constraint memory.  Bringing back `-H:-ParseOnce` reduces the
heap usage significantly, from 3.9G down to 2.7G in
integration-tests/main.

(cherry picked from commit 201b9a6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/21.3 bug Something isn't working testing CI and testing related issues and PRs
Projects
None yet
Development

No branches or pull requests

3 participants