Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM codegen for direct return of StaticArray(Tuple) with mixed type broken for aarch64-darwin #11021

Open
straight-shoota opened this issue Jul 26, 2021 · 6 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. platform:aarch64 platform:darwin topic:compiler:codegen

Comments

@straight-shoota
Copy link
Member

straight-shoota commented Jul 26, 2021

pp! StaticArray[{"d", 4}]

Cross-compiling the above code with crystal build --cross-compile --target aarch64-darwin results in an invalid memory access in LLVM. So probably
The call to pp! is important for reproduction because it calls to_slice on the static array. Calling to_slice directly on the static array does not trigger this error.

It would be interesting to see if it reproduces natively, i.e. without --cross-compile. Codegen is a bit different when cross-compiling. Can someone try building this example on aarch64-darwin?

Discovered while adding #sort to StaticArray (https://github.com/straight-shoota/crystal/runs/3155348344; also present in https://github.com/crystal-lang/crystal/pull/10889/checks?check_run_id=3150235316).

It reproduces with LLVM 11 and LLVM 10 and with Crystal 1.1 and Crystal 1.0.

Stacktrace (with LLVM 10.0.0)
Invalid memory access (signal 11) at address 0x40000002d
[0x8ef936] *Exception::CallStack::print_backtrace:(Int32 | Nil) +118
[0x8d706a] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +330
[0x7fabb8a493c0] ???
[0x7fabb9d1d110] ???
[0x7fabb9d37aaf] _ZNK4llvm14TargetLowering11LowerCallToERNS0_16CallLoweringInfoE +6607
[0x7fabb9d54cfa] _ZN4llvm19SelectionDAGBuilder14lowerInvokableERNS_14TargetLowering16CallLoweringInfoEPKNS_10BasicBlockE +1034
[0x7fabb9d3faea] _ZN4llvm19SelectionDAGBuilder11LowerCallToENS_17ImmutableCallSiteENS_7SDValueEbPKNS_10BasicBlockE +2442
[0x7fabb9d2aed1] _ZN4llvm19SelectionDAGBuilder9visitCallERKNS_8CallInstE +1441
[0x7fabb9d201c9] _ZN4llvm19SelectionDAGBuilder5visitERKNS_11InstructionE +105
[0x7fabb9da3c10] _ZN4llvm16SelectionDAGISel16SelectBasicBlockENS_14ilist_iteratorINS_12ilist_detail12node_optionsINS_11InstructionELb0ELb0EvEELb0ELb1EEES6_Rb +112
[0x7fabb9da3a07] _ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE +6487
[0x7fabb9da1396] _ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE +1942
[0x7fabb9a5a5e8] _ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE +280
[0x7fabb98c4d76] _ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE +1126
[0x7fabb98c4ff3] _ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE +51
[0x7fabb98c54a0] _ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE +960
[0x7fabbaa9b523] ???
[0x7fabbaa9b35f] LLVMTargetMachineEmitToFile +175
[0x16c79c9] *LLVM::TargetMachine#emit_to_file<LLVM::Module, String, LLVM::CodeGenFileType>:Bool +89
[0x16c7a0b] *LLVM::TargetMachine#emit_obj_to_file<LLVM::Module, String>:Bool +11
[0x16bef3f] *Crystal::Compiler#cross_compile<Crystal::Program, Array(Crystal::Compiler::CompilationUnit), String>:Nil +367
[0x16beaa0] *Crystal::Compiler#codegen<Crystal::Program, Crystal::ASTNode+, Array(Crystal::Compiler::Source), String>:(Tuple(Array(Crystal::Compiler::CompilationUnit), Array(String)) | Nil) +1888
[0x16c315c] *Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result +188
[0x193e4c9] *Crystal::Command::CompilerConfig#compile<String>:Crystal::Compiler::Result +57
[0x193e474] *Crystal::Command::CompilerConfig#compile:Crystal::Compiler::Result +36
[0x192d242] *Crystal::Command#build:Crystal::Compiler::Result +290
[0x192c44d] *Crystal::Command#run:(Bool | Nil) +413
[0x192c179] *Crystal::Command::run<Array(String)>:(Bool | Nil) +25
[0x192c13d] *Crystal::Command::run:(Bool | Nil) +29
[0x8b879a] __crystal_main +2394
[0x149d5e6] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6
[0x149d3f5] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +53
[0x8c3c36] main +6
[0x7fabb85e00b3] __libc_start_main +243
[0x8b7d7e] _start +46
[0x0] ???
Stacktrace for original code (with LLVM 10.0.0)
Invalid memory access (signal 11) at address 0x10266cd750
[0x8ef936] *Exception::CallStack::print_backtrace:(Int32 | Nil) +118
[0x8d706a] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +330
[0x7f7b0889e3c0] ???
[0x7f7b09b7211f] ???
[0x7f7b09baeb91] _ZN4llvm16SelectionDAGISel14LowerArgumentsERKNS_8FunctionE +6065
[0x7f7b09bf7388] _ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE +728
[0x7f7b09bf6396] _ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE +1942
[0x7f7b098af5e8] _ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE +280
[0x7f7b09719d76] _ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE +1126
[0x7f7b09719ff3] _ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE +51
[0x7f7b0971a4a0] _ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE +960
[0x7f7b0a8f0523] ???
[0x7f7b0a8f035f] LLVMTargetMachineEmitToFile +175
[0x16c79c9] *LLVM::TargetMachine#emit_to_file<LLVM::Module, String, LLVM::CodeGenFileType>:Bool +89
[0x16c7a0b] *LLVM::TargetMachine#emit_obj_to_file<LLVM::Module, String>:Bool +11
[0x16bef3f] *Crystal::Compiler#cross_compile<Crystal::Program, Array(Crystal::Compiler::CompilationUnit), String>:Nil +367
[0x16beaa0] *Crystal::Compiler#codegen<Crystal::Program, Crystal::ASTNode+, Array(Crystal::Compiler::Source), String>:(Tuple(Array(Crystal::Compiler::CompilationUnit), Array(String)) | Nil) +1888
[0x16c315c] *Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result +188
[0x193e4c9] *Crystal::Command::CompilerConfig#compile<String>:Crystal::Compiler::Result +57
[0x193e474] *Crystal::Command::CompilerConfig#compile:Crystal::Compiler::Result +36
[0x192d242] *Crystal::Command#build:Crystal::Compiler::Result +290
[0x192c44d] *Crystal::Command#run:(Bool | Nil) +413
[0x192c179] *Crystal::Command::run<Array(String)>:(Bool | Nil) +25
[0x192c13d] *Crystal::Command::run:(Bool | Nil) +29
[0x8b879a] __crystal_main +2394
[0x149d5e6] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6
[0x149d3f5] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +53
[0x8c3c36] main +6
[0x7f7b084350b3] __libc_start_main +243
[0x8b7d7e] _start +46
[0x0] ???
Stacktrace for original code (with LLVM 11.1.0)
Invalid memory access (signal 11) at address 0x10143d2570
[0x8f2a36] *Exception::CallStack::print_backtrace:(Int32 | Nil) +118
[0x8da16a] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +330
[0x7efd98cf53c0] ???
[0x7efd9a162fbb] ???
[0x7efd9a1a2904] _ZN4llvm16SelectionDAGISel14LowerArgumentsERKNS_8FunctionE +6548
[0x7efd9a1f0170] _ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE +1056
[0x7efd9a1ef201] _ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE +2001
[0x7efd99e6239e] _ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE +270
[0x7efd99c9f579] _ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE +953
[0x7efd99ca4b23] _ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE +51
[0x7efd99c9fb90] _ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE +992
[0x7efd9b03b6ba] ???
[0x7efd9b03b4ef] LLVMTargetMachineEmitToFile +175
[0x16caac9] *LLVM::TargetMachine#emit_to_file<LLVM::Module, String, LLVM::CodeGenFileType>:Bool +89
[0x16cab0b] *LLVM::TargetMachine#emit_obj_to_file<LLVM::Module, String>:Bool +11
[0x16c203f] *Crystal::Compiler#cross_compile<Crystal::Program, Array(Crystal::Compiler::CompilationUnit), String>:Nil +367
[0x16c1ba0] *Crystal::Compiler#codegen<Crystal::Program, Crystal::ASTNode+, Array(Crystal::Compiler::Source), String>:(Tuple(Array(Crystal::Compiler::CompilationUnit), Array(String)) | Nil) +1888
[0x16c625c] *Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result +188
[0x19415c9] *Crystal::Command::CompilerConfig#compile<String>:Crystal::Compiler::Result +57
[0x1941574] *Crystal::Command::CompilerConfig#compile:Crystal::Compiler::Result +36
[0x1930342] *Crystal::Command#build:Crystal::Compiler::Result +290
[0x192f54d] *Crystal::Command#run:(Bool | Nil) +413
[0x192f279] *Crystal::Command::run<Array(String)>:(Bool | Nil) +25
[0x192f23d] *Crystal::Command::run:(Bool | Nil) +29
[0x8bb89a] __crystal_main +2394
[0x14a06e6] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6
[0x14a04f5] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +53
[0x8c6d36] main +6
[0x7efd9888c0b3] __libc_start_main +243
[0x8bae7e] _start +46
[0x0] ???

/cc @maxfierke @bcardiff

@maxfierke
Copy link
Contributor

maxfierke commented Jul 26, 2021

Triggers assertion failures on LLVM 10 on M1 (using the universal test build):

$ crystal eval 'pp! StaticArray[{"d", 4}]'
Assertion failed: ((CLI.IsTailCall || InVals.size() == CLI.Ins.size()) && "LowerCall didn't emit the correct number of values!"), function LowerCallTo, file /var/cache/omnibus/src/llvm/llvm-10.0.0.src/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, line 9324.
Assertion failed: (ArgLocs.size() == Ins.size()), function LowerFormalArguments, file /var/cache/omnibus/src/llvm/llvm-10.0.0.src/lib/Target/AArch64/AArch64ISelLowering.cpp, line 3375.
Assertion failed: ((CLI.IsTailCall || InVals.size() == CLI.Ins.size()) && "LowerCall didn't emit the correct number of values!"), function LowerCallTo, file /var/cache/omnibus/src/llvm/llvm-10.0.0.src/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, line 9324.

will dig in more tonight

EDIT: did some digging at lunch. Release mode gives a different (I think more interesting) result:

$ crystal build --verbose --error-trace --release --no-debug test.cr
unimplemented reg-to-reg copy
UNREACHABLE executed at /var/cache/omnibus/src/llvm/llvm-10.0.0.src/lib/Target/AArch64/AArch64InstrInfo.cpp:2818!

which looks like it might be related to https://bugs.llvm.org/show_bug.cgi?id=46996 ? will give LLVM 12 a try later and see if it's still broken there too

@Sija
Copy link
Contributor

Sija commented Jul 26, 2021

which looks like it might be related to https://bugs.llvm.org/show_bug.cgi?id=46996 ? will give LLVM 12 a try later and see if it's still broken there too

Seems like it, considering this comment -> https://bugs.llvm.org/show_bug.cgi?id=46996#c13

@maxfierke
Copy link
Contributor

maxfierke commented Jul 27, 2021

Issue seems to be with the actual return value of the StaticArray, and not anything to do with the pp! call.

This compiles fine:

pp! StaticArray[{"d", 4}]; true

This does not:

StaticArray[{"d", 4}]

Looks like a very specific codegen bug for StaticArrays of Tuples with mixed type members and of four or fewer elements because this compiles fine:

StaticArray[{"d", 4}, {"e", 3}, {"f", 2}, {"g", 1}, {"h", 0}]

And this also works:

StaticArray[{1, 4}, {2, 3}, {3, 2}, {4, 1}]

As does

StaticArray[{"d", "4"}]

Which seems like the homogenous aggregate optimizations in the AArch64 ABI implementation are probably responsible for this IR being generated given this line:

if 0 < homog_agg[1] <= 4
. That's originally copied from the Rust ABI, so I'm not exactly sure if it's a bug in the original implementation or in the translation into Crystal

By the sounds of the linked bug, the generated IR is valid IR, just not what LLVM expects and so it's causing the aforementioned issue

@maxfierke
Copy link
Contributor

@straight-shoota a more apt title for this issue might be "LLVM codegen for direct return of StaticArray of Tuple with mixed type broken for aarch64-darwin" I don't think to_slice plays a role here and it really seems to be the mixed-types of the Tuple and short length that matter here

@straight-shoota straight-shoota changed the title LLVM codegen for StaticArray#to_slice broken on cross-compile for aarch64-darwin LLVM codegen for direct return of StaticArray(Tuple) with mixed type broken for aarch64-darwin Jul 29, 2021
@ggiraldez
Copy link
Contributor

ggiraldez commented Sep 30, 2021

This is indeed the mentioned LLVM bug, and unfortunately it's not fixed until version 13 (just tested with nightly LLVM builds). The issue is with returning big (as in occupying more than 1 64-bit register) composite types. The ABI code is not involved since this is all internal functions, and that code path is activated when generating external/C compatible functions.

A reduced (not sure it's minimal) LLVM IR which triggers the bug follows (compile with llc-10 < file.ll):

source_filename = "main_module"
target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown-linux-gnu"

%Foo = type { i64, i32 }

define [1 x %Foo] @__crystal_main(i32 %argc, i8** %argv) {
  %1 = alloca [1 x %Foo]
  %2 = call [1 x %Foo] @bar()
  store [1 x %Foo] %2, [1 x %Foo]* %1
  %3 = load [1 x %Foo], [1 x %Foo]* %1
  ret [1 x %Foo] %3
}

define internal [1 x %Foo] @bar() {
  %1 = alloca [1 x %Foo]
  %2 = load [1 x %Foo], [1 x %Foo]* %1
  ret [1 x %Foo] %2
}

Swapping the %Foo type to { i32, i64 } does not trigger the bug. Changing it to a couple of i64s works too. So does increasing the size of the array beyond 4, as @maxfierke points out. This is because that's the point when the type cannot be transferred using 8 64-bit registers, so LLVM changes the return strategy as indicated by the ARM64 ABI spec (see section 5.5 and step C.10 of section 5.4.2).

Other than waiting for LLVM13, a possible workaround would be to use some sort of indirect value return. For that, the codegen for calls and funs must be changed, or maybe it's possible/easier to perform a code transformation to add an out parameter to the function?

@beta-ziliani
Copy link
Member

We can try applying the patch to the LLVM11 brew formula as we did with the LLVM11 issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. platform:aarch64 platform:darwin topic:compiler:codegen
Projects
None yet
Development

No branches or pull requests

5 participants