Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix argument ABI for overaligned structs on ppc64le #122781

Merged
merged 2 commits into from
Apr 8, 2024

Conversation

nikic
Copy link
Contributor

@nikic nikic commented Mar 20, 2024

When passing a 16 (or higher) aligned struct by value on ppc64le, it needs to be passed as an array of i128 rather than an array of i64. This will force the use of an even starting doubleword.

For the case of a 16 byte struct with alignment 16 it is important that [1 x i128] is used instead of i128 -- apparently, the latter will get treated similarly to [2 x i64], not exhibiting the correct ABI. Add a force_array flag to Uniform to support this.

The relevant clang code can be found here:
https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L878-L884
https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L780-L784

I think the corresponding psABI wording is this:

Fixed size aggregates and unions passed by value are mapped to as
many doublewords of the parameter save area as the value uses in
memory. Aggregrates and unions are aligned according to their
alignment requirements. This may result in doublewords being
skipped for alignment.

In particular the last sentence. Though I didn't find any wording for Clang's behavior of clamping the alignment to 16.

Fixes #122767.

r? @cuviper

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 20, 2024
@cuviper
Copy link
Member

cuviper commented Mar 20, 2024

Do we also need to force array type for homogeneous aggregates like clang does here?
https://github.com/llvm/llvm-project/blob/b20360abeb3a80281dc082f1e093abd13cb1ee4c/clang/lib/CodeGen/Targets/PPC.cpp#L855-L863

@@ -255,11 +255,14 @@ pub struct Uniform {
/// can be shorter, i.e., `{ i64, i64, i32 }` for
/// 64-bit integers with a total size of 20 bytes.
pub total: Size,

/// Force the use of an array, even if there is only a single element.
pub force_array: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that so many places are just passing false, it might be nicer to have a fn new(unit, total) for non-array and another constructor for arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done now. All uses switched to either Uniform::new or Uniform::consecutive.

Comment on lines 143 to 158
// Simplify to a single unit when there is no prefix and size <= unit size
if self.rest.total <= self.rest.unit.size {
if self.rest.total <= self.rest.unit.size && !self.rest.force_array {
return rest_ll_unit;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should assert if force_array is true and there's a prefix, since we'll generate a struct in that case.

@@ -141,7 +141,7 @@ impl LlvmType for CastTarget {

if self.prefix.iter().all(|x| x.is_none()) {
// Simplify to a single unit when there is no prefix and size <= unit size
if self.rest.total <= self.rest.unit.size {
if self.rest.total <= self.rest.unit.size && !self.rest.force_array {
Copy link
Member

@bjorn3 bjorn3 Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add an assertion that force_array is not true in

fn cast_target_to_abi_params(cast: &CastTarget) -> SmallVec<[AbiParam; 2]> {
Cranelift IR doesn't jave a way to represent array types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having looked into this more, I suspect that the cranelift lowering actually is not quite correct and you'll have to handle the new is_consecutive flag somehow. Or at least, I don't see how your current code would ensure consecutive allocation. Note that this part applies to not just ppc64le, but also to aarch64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abi-cafe doesn't show any issues on aarch64 last time I checked.

@nikic
Copy link
Contributor Author

nikic commented Mar 21, 2024

Do we also need to force array type for homogeneous aggregates like clang does here? https://github.com/llvm/llvm-project/blob/b20360abeb3a80281dc082f1e093abd13cb1ee4c/clang/lib/CodeGen/Targets/PPC.cpp#L855-L863

Good question... I've been trying to figure out when exactly iN vs [1 x iN] makes a difference in terms of ABI.

It looks like the relevant TLI hook is functionArgumentNeedsConsecutiveRegisters(), which is overridden by ARM, AArch64 and PowerPC. The primary purpose of it is to ensure that either all the values get allocated to registers, or everything is spilled to the stack in packed representation, which is usually required for homogeneous aggregates.

Now, for single-element arrays, this usually wouldn't matter -- except in the case where the single element itself needs to be split across multiple registers, as is the case here with i128 on a target that has 64-bit registers. PowerPC has some additional magic in that this also affects the register passing ABI, but even without that it would affect whether it's allowed to spill only one of the registers to stack.

For that reason, I think that this doesn't matter for homogenous aggregates, because they are typically only defined on float/vector members that can be passed in one FPR register -- though maybe the upcoming f128 type could change something here?

@nikic
Copy link
Contributor Author

nikic commented Mar 21, 2024

Okay, I've reworked how this is done now. Quoting the commit comment:

The actual ABI implication here is that in some cases the values
are required to be "consecutive", i.e. must either all be passed
in registers or all on stack (without padding).

Adjust the code to either use Uniform::new() or Uniform::consecutive()
depending on which behavior is needed.

Then, when lowering this in LLVM, skip the [1 x i128] to i128
simplification if is_consecutive is set. i128 is the only case
I'm aware of where this is problematic right now. If we find
other cases, we can extend this (either based on target information
or possibly just by not simplifying for is_consecutive entirely).

That way, rustc_target just describes the required ABI, and the question of how this is achieved in terms of LLVM types is punted to rustc_codegen_llvm.

Copy link
Member

@cuviper cuviper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few more struct initializers that should now be Uniform::new, in nvptx64, riscv, sparc, and sparc64.

Otherwise, LGTM!

@bors
Copy link
Contributor

bors commented Apr 3, 2024

☔ The latest upstream changes (presumably #123429) made this pull request unmergeable. Please resolve the merge conflicts.

When passing a 16 (or higher) aligned struct by value on ppc64le,
it needs to be passed as an array of `i128` rather than an array
of `i64`. This will force the use of an even starting register.

For the case of a 16 byte struct with alignment 16 it is important
that `[1 x i128]` is used instead of `i128` -- apparently, the
latter will get treated similarly to `[2 x i64]`, not exhibiting
the correct ABI. Add a `force_array` flag to `Uniform` to support
this.

The relevant clang code can be found here:
https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L878-L884
https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L780-L784

I think the corresponding psABI wording is this:

> Fixed size aggregates and unions passed by value are mapped to as
> many doublewords of the parameter save area as the value uses in
> memory. Aggregrates and unions are aligned according to their
> alignment requirements. This may result in doublewords being
> skipped for alignment.

In particular the last sentence.

Fixes rust-lang#122767.
The actual ABI implication here is that in some cases the values
are required to be "consecutive", i.e. must either all be passed
in registers or all on stack (without padding).

Adjust the code to either use Uniform::new() or Uniform::consecutive()
depending on which behavior is needed.

Then, when lowering this in LLVM, skip the [1 x i128] to i128
simplification if is_consecutive is set. i128 is the only case
I'm aware of where this is problematic right now. If we find
other cases, we can extend this (either based on target information
or possibly just by not simplifying for is_consecutive entirely).
@nikic
Copy link
Contributor Author

nikic commented Apr 8, 2024

There are a few more struct initializers that should now be Uniform::new, in nvptx64, riscv, sparc, and sparc64.

Otherwise, LGTM!

Done & rebased!

@cuviper
Copy link
Member

cuviper commented Apr 8, 2024

@bors r+

@bors
Copy link
Contributor

bors commented Apr 8, 2024

📌 Commit 1b7342b has been approved by cuviper

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 8, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 8, 2024
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#122781 (Fix argument ABI for overaligned structs on ppc64le)
 - rust-lang#123367 (Safe Transmute: Compute transmutability from `rustc_target::abi::Layout`)
 - rust-lang#123518 (Fix `ByMove` coroutine-closure shim (for 2021 precise closure capturing behavior))
 - rust-lang#123547 (bootstrap: remove unused pub fns)
 - rust-lang#123564 (Don't emit divide-by-zero panic paths in `StepBy::len`)
 - rust-lang#123578 (Restore `pred_known_to_hold_modulo_regions`)
 - rust-lang#123591 (Remove unnecessary cast from `LLVMRustGetInstrProfIncrementIntrinsic`)
 - rust-lang#123632 (parser: reduce visibility of unnecessary public `UnmatchedDelim`)
 - rust-lang#123635 (CFI: Fix ICE in KCFI non-associated function pointers)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit ecfc338 into rust-lang:master Apr 8, 2024
11 checks passed
@rustbot rustbot added this to the 1.79.0 milestone Apr 8, 2024
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Apr 8, 2024
Rollup merge of rust-lang#122781 - nikic:ppc-abi-fix, r=cuviper

Fix argument ABI for overaligned structs on ppc64le

When passing a 16 (or higher) aligned struct by value on ppc64le, it needs to be passed as an array of `i128` rather than an array of `i64`. This will force the use of an even starting doubleword.

For the case of a 16 byte struct with alignment 16 it is important that `[1 x i128]` is used instead of `i128` -- apparently, the latter will get treated similarly to `[2 x i64]`, not exhibiting the correct ABI. Add a `force_array` flag to `Uniform` to support this.

The relevant clang code can be found here:
https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L878-L884
https://github.com/llvm/llvm-project/blob/fe2119a7b08b6e468b2a67768904ea85b1bf0a45/clang/lib/CodeGen/Targets/PPC.cpp#L780-L784

I think the corresponding psABI wording is this:

> Fixed size aggregates and unions passed by value are mapped to as
> many doublewords of the parameter save area as the value uses in
> memory. Aggregrates and unions are aligned according to their
> alignment requirements. This may result in doublewords being
> skipped for alignment.

In particular the last sentence. Though I didn't find any wording for Clang's behavior of clamping the alignment to 16.

Fixes rust-lang#122767.

r? `@cuviper`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extern-fn-explicit-align fails on ppc64le
6 participants