-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate ghc.exe: getMBlocks: VirtualAlloc MEM_COMMIT failed: The paging file is too small for this operation to complete
CI failures
#1961
Comments
primitives
table in SAWScript.Interpreter
ghc.exe: getMBlocks: VirtualAlloc MEM_COMMIT failed: The paging file is too small for this operation to complete
CI failures
Upon a closer look, I'm not entirely sure that
This time, the error arises in a different module. Urk. Why doesn't the In light of this, it's much less clear to me what is going on. |
A plausible explanation for why the Windows GHC 9.2 and 9.4 jobs do not suffer from this issue is that those versions of GHC use Windows' large address space allocator (see GHC#12576), whereas GHC 8.10 does not. While it is still a mystery to me why this class of build error has only recently started to appear in Windows GHC 8.10 jobs, it is unclear if it is worth the effort to debug this further, given that GHC 9.2+ has a more principled way of allocating large addresses. In light of this, I propose that we:
|
Resolves #1961, in the sense that the issue is now worked around.
The logic for retrying the `cabal build` command three consecutive times in a row upon failure was apparently added to work around macOS-related dylib issues, but it is unclear (and unlikely) if these issues still exist. Moreover, this logic has the distinct disadvantage of potentially masking more serious issues, such as the issues observed in #1961. In principle, `cabal build` should not be an especially flaky part of the CI—it should either work 100% of the time or fail 100% of the time. Let's just remove the `retry cabal build` logic to make it easier to notice issues with `cabal build` in the future. We can always revisit this choice later if need be.
The
SAWScript.Intepreter
file is one of the largest in the SAW codebase, currently clocking in at 4765 lines of code. What's more, most of this file consists of the monolithicprimitives
table, which consists of thousands of string literals that document all of the SAWScript commands. While no single entry inprimitives
is likely that taxing to compile times in isolation, each little bit does add up every time a new SAWScript command is added.Things are finally at a tipping point, however, as the amount of RAM required to compile
SAWScript.Intepreter
is causing the CI to crash. More specifically, in pull request #1958, which adds two commands toprimitives
, the Windows GHC 8.10.7 CI job has been observed to fail with this error consistently:Ouch. While it is curious that this error only happens with the Windows GHC 8.10.7 job (and none of the other Windows jobs), this suggests that the current status quo is fragile. Googling for this error message suggests that there is not much we can do besides using less RAM (and indeed, the GitHub Actions CI machines are somewhat constrained on RAM).
One thing that we could quite easily do to reduce RAM requirements is to split up the
primitives
table a bit. It currently contains every SAWScript definition, but we could quite easily divide it into smaller subtables. As a starting point, we could have separate tables for the LLVM-related commands, JVM-related commands, MIR-related commands, Heapster-related commands, etc., and thenprimitives
would consist of theunion
of all these tables. This would likely have a big impact on maximum RAM requirements without too much effort, and it worth a try before considering more drastic measures, such as putting the commands' documentation in external files (as suggested here).The text was updated successfully, but these errors were encountered: