-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix argument ABI for overaligned structs on ppc64le #122781
Conversation
Do we also need to force array type for homogeneous aggregates like clang does here? |
@@ -255,11 +255,14 @@ pub struct Uniform { | |||
/// can be shorter, i.e., `{ i64, i64, i32 }` for | |||
/// 64-bit integers with a total size of 20 bytes. | |||
pub total: Size, | |||
|
|||
/// Force the use of an array, even if there is only a single element. | |||
pub force_array: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that so many places are just passing false
, it might be nicer to have a fn new(unit, total)
for non-array and another constructor for arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done now. All uses switched to either Uniform::new or Uniform::consecutive.
// Simplify to a single unit when there is no prefix and size <= unit size | ||
if self.rest.total <= self.rest.unit.size { | ||
if self.rest.total <= self.rest.unit.size && !self.rest.force_array { | ||
return rest_ll_unit; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should assert if force_array
is true and there's a prefix, since we'll generate a struct in that case.
@@ -141,7 +141,7 @@ impl LlvmType for CastTarget { | |||
|
|||
if self.prefix.iter().all(|x| x.is_none()) { | |||
// Simplify to a single unit when there is no prefix and size <= unit size | |||
if self.rest.total <= self.rest.unit.size { | |||
if self.rest.total <= self.rest.unit.size && !self.rest.force_array { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add an assertion that force_array is not true in
fn cast_target_to_abi_params(cast: &CastTarget) -> SmallVec<[AbiParam; 2]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After having looked into this more, I suspect that the cranelift lowering actually is not quite correct and you'll have to handle the new is_consecutive flag somehow. Or at least, I don't see how your current code would ensure consecutive allocation. Note that this part applies to not just ppc64le, but also to aarch64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abi-cafe doesn't show any issues on aarch64 last time I checked.
Good question... I've been trying to figure out when exactly It looks like the relevant TLI hook is functionArgumentNeedsConsecutiveRegisters(), which is overridden by ARM, AArch64 and PowerPC. The primary purpose of it is to ensure that either all the values get allocated to registers, or everything is spilled to the stack in packed representation, which is usually required for homogeneous aggregates. Now, for single-element arrays, this usually wouldn't matter -- except in the case where the single element itself needs to be split across multiple registers, as is the case here with i128 on a target that has 64-bit registers. PowerPC has some additional magic in that this also affects the register passing ABI, but even without that it would affect whether it's allowed to spill only one of the registers to stack. For that reason, I think that this doesn't matter for homogenous aggregates, because they are typically only defined on float/vector members that can be passed in one FPR register -- though maybe the upcoming f128 type could change something here? |
Okay, I've reworked how this is done now. Quoting the commit comment:
That way, rustc_target just describes the required ABI, and the question of how this is achieved in terms of LLVM types is punted to rustc_codegen_llvm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few more struct initializers that should now be Uniform::new
, in nvptx64
, riscv
, sparc
, and sparc64
.
Otherwise, LGTM!
☔ The latest upstream changes (presumably #123429) made this pull request unmergeable. Please resolve the merge conflicts. |
When passing a 16 (or higher) aligned struct by value on ppc64le, it needs to be passed as an array of `i128` rather than an array of `i64`. This will force the use of an even starting register. For the case of a 16 byte struct with alignment 16 it is important that `[1 x i128]` is used instead of `i128` -- apparently, the latter will get treated similarly to `[2 x i64]`, not exhibiting the correct ABI. Add a `force_array` flag to `Uniform` to support this. The relevant clang code can be found here: https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L878-L884 https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L780-L784 I think the corresponding psABI wording is this: > Fixed size aggregates and unions passed by value are mapped to as > many doublewords of the parameter save area as the value uses in > memory. Aggregrates and unions are aligned according to their > alignment requirements. This may result in doublewords being > skipped for alignment. In particular the last sentence. Fixes rust-lang#122767.
The actual ABI implication here is that in some cases the values are required to be "consecutive", i.e. must either all be passed in registers or all on stack (without padding). Adjust the code to either use Uniform::new() or Uniform::consecutive() depending on which behavior is needed. Then, when lowering this in LLVM, skip the [1 x i128] to i128 simplification if is_consecutive is set. i128 is the only case I'm aware of where this is problematic right now. If we find other cases, we can extend this (either based on target information or possibly just by not simplifying for is_consecutive entirely).
Done & rebased! |
@bors r+ |
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#122781 (Fix argument ABI for overaligned structs on ppc64le) - rust-lang#123367 (Safe Transmute: Compute transmutability from `rustc_target::abi::Layout`) - rust-lang#123518 (Fix `ByMove` coroutine-closure shim (for 2021 precise closure capturing behavior)) - rust-lang#123547 (bootstrap: remove unused pub fns) - rust-lang#123564 (Don't emit divide-by-zero panic paths in `StepBy::len`) - rust-lang#123578 (Restore `pred_known_to_hold_modulo_regions`) - rust-lang#123591 (Remove unnecessary cast from `LLVMRustGetInstrProfIncrementIntrinsic`) - rust-lang#123632 (parser: reduce visibility of unnecessary public `UnmatchedDelim`) - rust-lang#123635 (CFI: Fix ICE in KCFI non-associated function pointers) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#122781 - nikic:ppc-abi-fix, r=cuviper Fix argument ABI for overaligned structs on ppc64le When passing a 16 (or higher) aligned struct by value on ppc64le, it needs to be passed as an array of `i128` rather than an array of `i64`. This will force the use of an even starting doubleword. For the case of a 16 byte struct with alignment 16 it is important that `[1 x i128]` is used instead of `i128` -- apparently, the latter will get treated similarly to `[2 x i64]`, not exhibiting the correct ABI. Add a `force_array` flag to `Uniform` to support this. The relevant clang code can be found here: https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L878-L884 https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L780-L784 I think the corresponding psABI wording is this: > Fixed size aggregates and unions passed by value are mapped to as > many doublewords of the parameter save area as the value uses in > memory. Aggregrates and unions are aligned according to their > alignment requirements. This may result in doublewords being > skipped for alignment. In particular the last sentence. Though I didn't find any wording for Clang's behavior of clamping the alignment to 16. Fixes rust-lang#122767. r? `@cuviper`
When passing a 16 (or higher) aligned struct by value on ppc64le, it needs to be passed as an array of
i128
rather than an array ofi64
. This will force the use of an even starting doubleword.For the case of a 16 byte struct with alignment 16 it is important that
[1 x i128]
is used instead ofi128
-- apparently, the latter will get treated similarly to[2 x i64]
, not exhibiting the correct ABI. Add aforce_array
flag toUniform
to support this.The relevant clang code can be found here:
https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L878-L884
https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L780-L784
I think the corresponding psABI wording is this:
In particular the last sentence. Though I didn't find any wording for Clang's behavior of clamping the alignment to 16.
Fixes #122767.
r? @cuviper