-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: a compiler compiled in release mode can no longer compile anything else #10359
Comments
It seems this bugs prevents further development of crystalline, a language server protocol. It would have been nice to know that, I would probably have fixed it by now. I'll try looking at it this week if I have time. |
It may be a LLVM 11.0 bug, please try with 11.1.0. |
@j8r How did you reach that conclusion? |
It is not a conclusion, it is an hyphotesis, I don't know much about LLVM 😅 Better try the patched LLVM versions, it is easy and can save time. |
Also it is mentioned of "renumbering values", so I thought it might be this bug. I don't want you to waste your time @asterite 😉 |
@j8r But that's the thing. Every time there's a misleading comment I have to waste time jumping in and saying "No, this is wrong, don't waste time looking into this direction". I prefer if we would avoid misleading comments, or unfounded hypothesis. This is clearly a bug in the compiler where a number is outputted with the type suffix and that overflows. |
Making helpful suggestions is still encouraged. But this is very clearly not related to LLVM at all because it happens in the semantic stage of the compiling process. So it's entirely internal to the compiler. |
There should be a reason why it worked before and not now. |
To whoever wants to reproduce this:
^ THAT'S WRONG
THAT'S CORRECT Because... doesn't it feel good to pass an argument to |
@asterite Yeah sorry, just edited the original post to fix this 😉. |
@elbywan No need to say sorry, that behavior is a mess and I'd like to change that |
I reproduced the issue: |
@bcardiff If I'm not mistaken, the issue is pretty serious: if you compile a compiler in release mode, it can't compile anything else that requires the prelude. I don't know what's going on... I think this is a blocker for 1.0 |
Yes, this looks like a major issue. On linux it sig faults:
|
We don't build macos nightlies, right? I suppose they would be affected, too. And any local built compiler with release mode. On linux nightlies (and latest) are still linked against LLVM 10, so that's safe. |
This is undetected in the CI becasue
In CircleCI The only Crystal compiler in release mode I am aware of comes from homebrew. |
I'm not sure that's the issue. The compiler build in release mode from homebrew works fine. The problem is building a new compiler in release mode from that one. But maybe the one in homebrew is already broken? I'm not sure. My theroies:
We could do a bisect too. What I'll try as a first step is checking out |
So, going back to # foo.cr (empty)
Output:
I'm going to try now with LLVM 10 (I don't think it's an LLVM issue, we probably have a bug that is only triggered when LLVM 11+ is used.) |
Another "bad" thing: it seems |
The repro steps #10359 (comment) also fail for LLVM 9 for me. It might be that the 0.36.1 is already buggy. Next I'm going to try checking out 0.35.1 and do the repro steps. Hopefully it works fine and the issue is somewhere between 0.35.1 and 0.36.1! 🤞 |
Well... it segfaults during compilation 😞 I think we'll have to go back to using an 0.35.1 compiler and start from there. But I ran out of time for today. |
Hopefully this bug has some relation to enums, uint32 and converting literals to them. We'll see. |
I'm thinking this bug is related to union of numbers and assigning/extracting values from them. This was fixed recently, but maybe there's still something going on there... As a workaround maybe we can change those |
That workaround wouldn't fix user code with |
Not sure if it matters but adding |
It seems that the bug doesn't happen in def foo(str)
if 1.to_s == "1"
str.to_u32
end
end
p foo("1")
p foo("2")
p foo("3") I also disassembled both variants to ensure that Debugging also turned out that
Using |
I was able to reproduce this bug without puts, just calling to_s and LibC.printf I'll try to reduce the problem even further but commenting prelude code. At least that's how I found the cause of another bug. |
For instance, this reproduces the problem and there's no class NumberLiteral
def initialize(@value : Int32, @kind : Int32)
end
def kind
@kind
end
def value
@value
end
end
struct Int32
def u32?
case self
when 0 then 0_u32
when 1 then 1_u32
when 2 then 2_u32
when 3 then 3_u32
when 4 then 4_u32
when 5 then 5_u32
when 6 then 6_u32
when 7 then 7_u32
when 8 then 8_u32
else nil
end
end
end
struct UInt32
def to_s2 : UInt8
return 48_u8 if self == 0
ptr = Pointer(UInt8).malloc(2) + 1
num = self
while num != 0
ptr = ptr - 1
ptr.value = 48_u8 &+ num.unsafe_mod(10)
num = num.unsafe_div(10)
end
ptr.value
end
end
def foo(num)
case num.kind
when 0
case 1
when 0 then num.value.to_i8!
when 1 then num.value.u32? || (Pointer(UInt8).malloc(2); LibC.exit 0)
else
LibC.exit 0
end
else
Pointer(UInt8).malloc(2)
LibC.exit 0
end
end
class String
def to_unsafe
pointerof(@c)
end
end
class Array2
def initialize
@buffer = Pointer(NumberLiteral).malloc(4)
@buffer[0] = NumberLiteral.new(0, 0)
@buffer[1] = NumberLiteral.new(1, 0)
end
def each
yield @buffer[0]
yield @buffer[1]
end
end
members = Array2.new
members.each do |value|
z = foo(value)
if z.is_a?(UInt32)
LibC.printf "%c\n", z.to_s2
end
end |
I compared the LLVM IR outputs of #10359 (comment) between these compilers:
The LLVM 10 one is from Bintray and the LLVM 11 one is compiled in release mode; the latter produces a SIGSEGV after printing .LBB0_102: # %body.i24.i.i.i.i
# Parent Loop BB0_92 Depth=1
# => This Inner Loop Header: Depth=2
movq %r13, %rax
imulq %rdi, %rax
shrq $35, %rax
leal (%rax,%rax), %edx
leal (%rdx,%rdx,4), %edx
movl %r13d, %esi
subl %edx, %esi
movzbl ".L'0123456789abcdefghi...'.1"+12(%rsi), %edx This line would be body.i24.i.i.i.i: ; preds = %body.i24.i.i.i.i.preheader, %body.i24.i.i.i.i
%num.042.i.i.i.i.i = phi i32 [ %463, %body.i24.i.i.i.i ], [ %obj1.sroa.2.8.extract.trunc.i.i.i.i, %body.i24.i.i.i.i.preheader ]
%ptr20.041.i.i.i.i.i = phi i8* [ %462, %body.i24.i.i.i.i ], [ %355, %body.i24.i.i.i.i.preheader ]
%462 = getelementptr inbounds i8, i8* %ptr20.041.i.i.i.i.i, i64 -1
%scevgep = getelementptr i8, i8* %ptr20.041.i.i.i.i.i, i64 -1
%num.042.i.i.i.i.i.frozen = freeze i32 %num.042.i.i.i.i.i
%463 = udiv i32 %num.042.i.i.i.i.i.frozen, 10
%464 = mul i32 %463, 10
%.decomposed315 = sub i32 %num.042.i.i.i.i.i.frozen, %464
%465 = zext i32 %.decomposed315 to i64
%466 = getelementptr inbounds { i32, i32, i32, [37 x i8] }, { i32, i32, i32, [37 x i8] }* @"'0123456789abcdefghi...'.1", i64 0, i32 3, i64 %465
%467 = load i8, i8* %466, align 1 Only LLVM 11 generates that The internal layout of the integer union is module Crystal
class LLVMTyper
private def create_llvm_type(type : MixedUnionType, wants_size)
@llvm_context.struct(llvm_name) do |a_struct|
# ...
padding = pointer_size - size_of(@llvm_context.int32)
llvm_value_type = @llvm_context.int8.array(max_size * pointer_size)
if padding > 0
[@llvm_context.int32, @llvm_context.int8.array(padding), llvm_value_type]
else
[@llvm_context.int32, llvm_value_type]
end
end
def union_value_type(type : MixedUnionType)
llvm_type(type).struct_element_types[-1]
end
end
end This works, and the union becomes %obj1.sroa.2.sroa.6.0.insert.ext.i.i.i.i = zext i8 %__temp_489.elt3.elt11.i.i to i32
%obj1.sroa.2.sroa.6.0.insert.shift.i.i.i.i = shl nuw i32 %obj1.sroa.2.sroa.6.0.insert.ext.i.i.i.i, 24
%obj1.sroa.2.sroa.5.0.insert.ext.i.i.i.i = zext i8 %__temp_489.elt3.elt9.i.i to i32
%obj1.sroa.2.sroa.5.0.insert.shift.i.i.i.i = shl nuw nsw i32 %obj1.sroa.2.sroa.5.0.insert.ext.i.i.i.i, 16
%obj1.sroa.2.sroa.4.0.insert.ext.i.i.i.i = zext i8 %__temp_489.elt3.elt7.i.i to i32
%obj1.sroa.2.sroa.4.0.insert.shift.i.i.i.i = shl nuw nsw i32 %obj1.sroa.2.sroa.4.0.insert.ext.i.i.i.i, 8
%obj1.sroa.2.sroa.0.0.insert.ext.i.i.i.i = zext i8 %__temp_489.elt3.elt.i.i to i32
%obj1.sroa.2.sroa.5.0.insert.insert.i.i.i.i = or i32 %obj1.sroa.2.sroa.5.0.insert.shift.i.i.i.i, %obj1.sroa.2.sroa.0.0.insert.ext.i.i.i.i
%obj1.sroa.2.sroa.4.0.insert.insert.i.i.i.i = or i32 %obj1.sroa.2.sroa.5.0.insert.insert.i.i.i.i, %obj1.sroa.2.sroa.6.0.insert.shift.i.i.i.i
%obj1.sroa.2.sroa.0.0.insert.insert.i.i.i.i = or i32 %obj1.sroa.2.sroa.4.0.insert.insert.i.i.i.i, %obj1.sroa.2.sroa.4.0.insert.shift.i.i.i.i This also doesn't explain why the compiler itself has to be built in release mode. However all this testing makes me believe the same issue can also happen with any union of Crystal structs, not just integers. |
Since the official releases does not use llvm@11, and llvm@10 seems to have no issue. I see no need to delay 1.0 on this issue. We can even fallback to llvm@9 in homebrew. It is definitely needed to declare that Crystal supports llvm@11. Maybe 1.0.1. Last @HertzDevil comment means also that #10453 as a workaround might not even catch some occurrences of the underlying issue. |
Should we remove 11.0 from supported LLVM versions then? It is partly supported, so maybe it should stay. |
It can stay otherwise is a chicken-vs-egg problem to me. But on the following release we can highlight this as a known issue for llvm 11. |
I wanted to move Arch Linux's package to LLVM 11 but @jhass pointed me to this thread. What is the current recommendation? Stay with LLVM 10? Is there any workaround for LLVM11 configuration? |
Just stay with LLVM 10 until LLVM 11 support is complete. |
It seems there's a bug in LLVM regarding the DivRemPairs optimization. This was caught by @ggiraldez and I posted the question in the llvm forum. There's a fix that might land in llvm 12.1, we'll see. With that fix, I was able to compile and run one of the examples successfully, but it also took a crazy amount of time to compile. I will investigate what's the issue there, and if everything else is working properly. BTW, to compile with LLVM 11.1 in a Mac the problem with libxml persists. The solution is to make llvm with the following command:
|
@beta-ziliani Maybe you did not compile llvm locally in release mode? This depends on the (I was checking if there were invalid alignment issues, but even lowering all alignments to 1 leads to the same invalid results) If that patch is all that is needed the options to move forward are:
Of course other distribution packages will need to be addressed, but homebrew is the one that is pushing this more. The others can wait a little longer. I will build llvm 11 locally and see how it goes... :-) |
Hombrew PR sent Homebrew/homebrew-core#79289 🤞 |
homebrew pr has been merged and bottles updated
🎉 I guess we can close this issue now. llvm 11 support is conditioned to having this patch applied. for other maintainers (@anatol) crystal 1.0 can be republished as is as long as llvm 11 is patched. Whether that patch will be included or not in 12.0.1, it can be followed in https://bugs.llvm.org/show_bug.cgi?id=50695 |
Issue
It is hard to summarize the issue clearly so I'll try my best to explain 😉.
Producing a binary with the
--release
flag that requires the crystal compiler (require "compiler/crystal/**"
) is unable to compile other programs (at least on my machine running macos & llvm 11).Running the binary produces the following error:
Full stacktrace
It seems like one value of the
Crystal::DWARF::FORM
enum (of typeUInt32
) is inlined by the compiler to4294967306_u32
which is indeed out of bounds.More generally every enum of type
UInt32
seem to be messed up during compilation.I have very limited expertise about this but I would guess that this is caused by LLVM 11 optimizations since this code used to run fine with LLVM 9.
How to reproduce
The following file compiles itself and prints the values of an UInt32 enum twice (first time during the compilation with the crystal compiler, then a second time when the program runs and compiles itself).
The
enum
part is not required to produce the stacktrace (compiling puts("hello world") is failing too) but is useful to demonstrate that something fishy is happening with UInt32 enums.crystal run file.cr
Produces the expected correct output:
crystal run --release file.cr
Produces the incorrect output:
(+ the stacktrace at the end)
Crystal -v
The text was updated successfully, but these errors were encountered: