Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to workaround GHC heisenbug on CI #2444

Merged
merged 1 commit into from
Mar 30, 2023
Merged

Conversation

DigitalBrains1
Copy link
Member

@DigitalBrains1 DigitalBrains1 commented Mar 29, 2023

In very specific tests in GitLab CI we are affected by GHC bug #19421. We can work around the issue by passing -with-rtsopts=-xm20000000 when compiling an affected binary. This is a stopgap measure until the real bug is fixed.

We have seen the bug:

  • In clash-testsuite in clashLibTests
  • In ffi:example in the clash binary itself
  • In prelude:doctests, probably in the doctests binary itself, although this is not certain.

and then only in GHC 9.0.2, although the bug should be in other versions of GHC as well.

This workaround was applied only to GHC 9.0.2 on CI and only to those cases that were observed to go wrong, although as a consequence now the clash binary is always built with the RTS option.

I tried to limit applying this RTS option as much as possible because of the documentation:

❗Warning
This option is for working around memory allocation problems only. Do not use unless GHCi fails with a message like [...]

Our problem has nothing to do whatsoever with GHCi, but even if we ignore that, it still says "do not use unless". It would be possible to restrict the building of the clash binary with the option to just ffi:example, by giving it its own special cabal.project.local, but I did not take it that far in this PR.

The ffi:example test should no longer be affected by the GHC bug, but it is a fairly useless test in its current state as it does not fail when there are issues, and the tested code is actually faulty. It is disabled to be fixed later.

Still TODO:

  • Write a changelog entry (see changelog/README.md)
  • Check copyright notices are up to date in edited files This is a temporary CI workaround. The sooner it is removed, the better. No need for copyright messages if the code is soon to be removed again.

Copy link
Member

@martijnbastiaan martijnbastiaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of -fworkaround-heisenbug, call it -fworkaround-ghc19421?

@@ -10,7 +10,7 @@ package *

package clash-prelude
ghc-options: -Werror
flags: +doctests +multiple-hidden
flags: +doctests +multiple-hidden -workaround-heisenbug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want +workaround-heisenbug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would set it for all GHC versions. .ci/setup.sh turns it on just for GHC 9.0.2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that workaround-ghc19421 is more specific then workaround-heisenbug.
But I still hate it, the name is meaningless to my mind.
And also confusingly if one were to read the GHC-9.0.2 release notes, it lists 19421 as fixed.

I propose something like workaround-ghc-mmap-crash

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty strong word for such a minor thing 😅.

Suggestion seems fine to me. Approved the PR too.

@DigitalBrains1 DigitalBrains1 marked this pull request as ready for review March 29, 2023 15:34
In very specific tests in GitLab CI we are affected by GHC bug #19421.
We can work around the issue by passing `-with-rtsopts=-xm20000000` when
compiling an affected binary. This is a stopgap measure until the real
bug is fixed.

We have seen the bug:
- In `clash-testsuite` in `clashLibTest`s
- In `ffi:example` in the `clash` binary itself
- In `prelude:doctests`, probably in the `doctests` binary itself,
  although this is not certain.

and then only in GHC 9.0.2, although the bug should be in other versions
of GHC as well.

This workaround was applied only to GHC 9.0.2 on CI and only to those
cases that were observed to go wrong, although as a consequence now the
`clash` binary is always built with the RTS option.

The `ffi:example` test should no longer be affected by the GHC bug, but
it is a fairly useless test in its current state as it does not fail
when there are issues, and the tested code is actually faulty. It is
disabled to be fixed later.
@DigitalBrains1
Copy link
Member Author

If people like the new version, feel free to merge and get this show on the road again!

@DigitalBrains1
Copy link
Member Author

DigitalBrains1 commented Mar 30, 2023

In prelude:doctests, probably in the doctests binary itself, although this is not certain.

I tried to determine this but the bug is being a cute little heisenbug. I can't even reproduce locally at all, it mainly happens on CI. I tried passing --verbose to the doctest to see if it printed anything before dying, but with --verbose, the bug does not appear. So I gave up. I could determine its location, but not its speed (how quickly it happens). It was Heisenberg through and through in that respect!

@DigitalBrains1 DigitalBrains1 merged commit 750150c into master Mar 30, 2023
@DigitalBrains1 DigitalBrains1 deleted the heisenbug-flag branch March 30, 2023 08:24
martijnbastiaan added a commit to acairncross/clash-compiler that referenced this pull request May 17, 2023
DigitalBrains1 added a commit that referenced this pull request Jun 13, 2023
PR #2444 added a workaround for GitLab CI, but it is also needed on the
GitHub CI we run for external contributions.
DigitalBrains1 added a commit that referenced this pull request Jun 13, 2023
PR #2444 added a workaround for GitLab CI, but it is also needed on the
GitHub CI we run for external contributions.
DigitalBrains1 added a commit that referenced this pull request Jun 14, 2023
PR #2444 added a workaround for GitLab CI, but it is also needed on the
GitHub CI we run for external contributions.
DigitalBrains1 added a commit that referenced this pull request Jun 14, 2023
PR #2444 added a workaround for GitLab CI, but it is also needed on the
GitHub CI we run for external contributions.
DigitalBrains1 added a commit that referenced this pull request Jun 14, 2023
PR #2444 added a workaround for GitLab CI, but it is also needed on the
GitHub CI we run for external contributions.
leonschoorl pushed a commit that referenced this pull request Aug 24, 2023
In very specific tests in GitLab CI we are affected by GHC bug #19421.
We can work around the issue by passing `-with-rtsopts=-xm20000000` when
compiling an affected binary. This is a stopgap measure until the real
bug is fixed.

We have seen the bug:
- In `clash-testsuite` in `clashLibTest`s
- In `ffi:example` in the `clash` binary itself
- In `prelude:doctests`, probably in the `doctests` binary itself,
  although this is not certain.

and then only in GHC 9.0.2, although the bug should be in other versions
of GHC as well.

This workaround was applied only to GHC 9.0.2 on CI and only to those
cases that were observed to go wrong, although as a consequence now the
`clash` binary is always built with the RTS option.

The `ffi:example` test should no longer be affected by the GHC bug, but
it is a fairly useless test in its current state as it does not fail
when there are issues, and the tested code is actually faulty. It is
disabled to be fixed later.
leonschoorl pushed a commit that referenced this pull request Aug 24, 2023
PR #2444 added a workaround for GitLab CI, but it is also needed on the
GitHub CI we run for external contributions.
leonschoorl added a commit that referenced this pull request Aug 29, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on some of our 
packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and 
applies
the same `-xm20000000` RTS option everywhere via the GHCRTS environment 
variable.

But for this to work we have to link all binaries with the `-rtsopts` 
flag,
otherwise they'll ignore just the GHCRTS.
leonschoorl added a commit that referenced this pull request Aug 29, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll just ignore the GHCRTS.
leonschoorl added a commit that referenced this pull request Aug 29, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll just ignore the GHCRTS.
leonschoorl added a commit that referenced this pull request Aug 30, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll just ignore the GHCRTS.
leonschoorl added a commit that referenced this pull request Aug 31, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll just ignore the GHCRTS.
leonschoorl added a commit that referenced this pull request Aug 31, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll just ignore the GHCRTS.
leonschoorl added a commit that referenced this pull request Aug 31, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll error out that the GHCRTS contains
unsupported options.
leonschoorl added a commit that referenced this pull request Aug 31, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll error out that the GHCRTS contains
unsupported options.
leonschoorl added a commit that referenced this pull request Sep 4, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll error out that the GHCRTS contains
unsupported options.
mergify bot pushed a commit that referenced this pull request Sep 7, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll error out that the GHCRTS contains
unsupported options.

(cherry picked from commit 9914030)

# Conflicts:
#	.ci/cabal.project.local
#	.ci/gitlab/common.yml
#	.ci/gitlab/test.yml
#	clash-cores/clash-cores.cabal
#	clash-ffi/clash-ffi.cabal
#	clash-lib/clash-lib.cabal
#	nix/overlay.nix
#	tests/src/Test/Tasty/Clash.hs
leonschoorl added a commit that referenced this pull request Sep 7, 2023
In PRs #2444 we added the workaround-ghc-mmap-crash flag on
some of our packages.
That flag added `-with-rtsopts=-xm20000000` to some of our binaries.

But other binaries would occasionally still trigger the mmap bug,
most importantly ghc itself.

This commit undoes the previous way of applying the workaround and
applies the same `-xm20000000` RTS option everywhere
via the GHCRTS environment variable.

But for this to work we have to link all binaries with the `-rtsopts`
flag, otherwise they'll error out that the GHCRTS contains
unsupported options.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants