Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler bug in shards nightly build #7480

Closed
straight-shoota opened this issue Feb 25, 2019 · 20 comments
Closed

Compiler bug in shards nightly build #7480

straight-shoota opened this issue Feb 25, 2019 · 20 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler

Comments

@straight-shoota
Copy link
Member

The most recent nightly builds are failing at dist_linux (and dist_linux32) step while compiling shards 0.8.1 (https://circleci.com/gh/crystal-lang/crystal/19143). MacOS (dist_darwin) seems to be unaffected (https://circleci.com/gh/crystal-lang/crystal/19145).

The same build works fine with a Crystal 0.27.2 compiler. The last successful nightly build was on February 8 (https://circleci.com/gh/crystal-lang/crystal/18125) with 0da24ec and the first failing build was on February 11 (https://circleci.com/gh/crystal-lang/crystal/18365) with d956786. For some reason, there seems to have been no nightly builds on February 9 and 10.

Changes between these commits: 0da24ec...d956786

@straight-shoota
Copy link
Member Author

It seems the most likely candidate is #6945 because its change is specific to static builds on musl (which fits the affected shards build) and doesn't impact MacOS.

/cc @j8r

@straight-shoota straight-shoota added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler labels Feb 25, 2019
@straight-shoota straight-shoota added this to the 0.28.0 milestone Feb 25, 2019
@asterite
Copy link
Member

We should definitely revert #6945 , I just saw it and it makes no sense at all. Require LLVM stuff in every program? Why? And how does that fix anything?

@j8r
Copy link
Contributor

j8r commented Feb 25, 2019

No, this isn't requiring LLVM in every program - only statically linked ones in musl. It puts this hack which is in distribution-scripts, in mainstream.
This line in distribution-script can therefore be removed now.

@asterite
Copy link
Member

Hmmm... okay. Now I'm not sure #6945 is the reason it's failing.

@j8r
Copy link
Contributor

j8r commented Mar 1, 2019

I can compile shards successfully on jrei/crystal-alpine, which has crystal 0.27.2. The command I ran is crystal build --stats --target x86_64-linux-musl src/shards.cr -o shards --static ${release:+--release}

@j8r
Copy link
Contributor

j8r commented Mar 1, 2019

-D -lunwind has to be added at the compilation to avoid a segmentation fault when running ./shards.

@straight-shoota
Copy link
Member Author

@j8r Yes, 0.27.2 is fine. The error was introduced in 0.28.0-dev somewhere between 0da24ec and d956786.

If you'd like to investigate, it should be very easy to identify the first bad commit using binary search (git bisect could help for that).

@straight-shoota
Copy link
Member Author

-D -lunwind has to be added at the compilation to avoid a segmentation fault when running ./shards.

That's what #6945 fixed, and previously the hack in distribution-scripts, right?

@j8r
Copy link
Contributor

j8r commented Mar 1, 2019

In fact we can run ./shards, but after installing minitest it returns a segfault – strange. Not every applications behave like this.

I'll compile shards with Crystal master then.

@ysbaddaden
Copy link
Contributor

We must revert #6945 anyway. I accepted the initial patch (link against libunwind) as a mitigation to the bug, but the libLLVM hack that came afterwards is horrible —it shouldn't even be in the distribution scripts. A proper solution would be to get libstdc++ to raise the exceptions.

@straight-shoota
Copy link
Member Author

straight-shoota commented Mar 1, 2019

@ysbaddaden I agree. Let's revert it. Maybe that will also solve this issue.
But it's still a compiler error and we'd need to investigate it's reasons even if it doesn't pop up any more.

A proper solution would be to get libstdc++ to raise the exceptions.

Any ideas how this can be achieved?

@j8r
Copy link
Contributor

j8r commented Mar 5, 2019

I've tested with Crystal 0.28.0-dev [4a5c263ba] (2019-03-05) in an Alpine container: .build/crystal build --static shards/src/shards.cr -o shard
No problem so far with the compilation nor when running inside and outside the container.

Maybe it has to do with the debian multi-stage build.

@straight-shoota
Copy link
Member Author

Yeah, I was not yet able to reproduce the error locally (running manual steps) as well. Seems like it somehow depends on the CI environment.

@j8r
Copy link
Contributor

j8r commented Mar 5, 2019

@straight-shoota nevertheless I was able to reproduce it with distribution-scripts. I highly suspect that the Debian-Alpine multistage build has to do with the issue.
I don't know if it's because of glibc-musl or on the LLVM side.

@j8r
Copy link
Contributor

j8r commented Mar 6, 2019

After reverting the commit ead1b6b75eca42da56981474f242133e36ce89d0,when building on Alpine in the multi-stage build, the compilation succeed. shards is still segfaulting, even with using -lunwind (in both crystal compilations and shards compilation).
The early experiments in #6945 about linking against lunwind to prevent segfaults won't be enough in this case.
Reverting the commit on both Debian and Alpine crystal compilations and using the previous hack solves this issues.

We can revert this commit, then find a proper solution after #7479 merged.

@straight-shoota
Copy link
Member Author

Thanks @j8r for investigating! So when we revert #6945 builds should be working again.

This issue is still about a compiler error which isn't fixed by reverting #6945. It is just avoided. However, it seems to be difficult to reproduce this compiler error.

A proper solution to #6934 would be to fix #4276 as @ysbaddaden suggested.

@bcardiff bcardiff removed this from the 0.28.0 milestone Mar 27, 2019
@asterite
Copy link
Member

asterite commented May 8, 2019

Could this be given a better title?

@ysbaddaden
Copy link
Contributor

ysbaddaden commented May 9, 2019

Duplicate of #4276 ?

@waj
Copy link
Member

waj commented May 7, 2020

Should we assume this is fixed now? However the original error Missing hash key: NoReturn (KeyError) makes me think this was actually unrelated with static linking.

@straight-shoota
Copy link
Member Author

Closing. We haven't been able to isolate the bug in two years, so it doesn't make sense to keep this open. Might already be fixed by now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler
Projects
None yet
Development

No branches or pull requests

6 participants