-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always generate GEP i8 / ptradd for struct offsets #121665
Conversation
Some changes occurred in compiler/rustc_codegen_gcc |
// typedef struct va_list { | ||
// void * stack; // next stack param | ||
// void * gr_top; // end of GP arg reg save area | ||
// void * vr_top; // end of FP/SIMD arg reg save area | ||
// int gr_offs; // offset from gr_top to next GP register arg | ||
// int vr_offs; // offset from vr_top to next FP/SIMD register arg | ||
// } va_list; | ||
let va_list_addr = list.immediate(); | ||
let va_list_layout = list.deref(bx.cx).layout; | ||
let va_list_ty = va_list_layout.llvm_type(bx); | ||
|
||
// There is no padding between fields since `void*` is size=8 align=8, `int` is size=4 align=4. | ||
// See https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst | ||
// Table 1, Byte size and byte alignment of fundamental data types | ||
// Table 3, Mapping of C & C++ built-in data types | ||
let ptr_offset = 8; | ||
let i32_offset = 4; | ||
let gr_top = bx.inbounds_ptradd(va_list_addr, bx.cx.const_usize(ptr_offset)); | ||
let vr_top = bx.inbounds_ptradd(va_list_addr, bx.cx.const_usize(2 * ptr_offset)); | ||
let gr_offs = bx.inbounds_ptradd(va_list_addr, bx.cx.const_usize(3 * ptr_offset)); | ||
let vr_offs = bx.inbounds_ptradd(va_list_addr, bx.cx.const_usize(3 * ptr_offset + i32_offset)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally I'd feel bad about this kind of manual layout computation, but this file is already packed full of hardcoded offsets and assumptions.
At some point we should re-evaluate whether we can use the LLVM va_arg
directly.
To that point, I'm confused about the comment
The LLVM va_arg instruction is lacking in some instances, so we should only use it as a fallback.
...does Clang also implement this logic manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. LLVM types are not rich enough to encode all the necessary ABI requirements to make something like the va_arg instruction work correctly. It would probably be best to remove it entirely.
let llval = match self.layout.abi { | ||
_ if offset.bytes() == 0 => { | ||
// Unions and newtypes only use an offset of 0. | ||
// Also handles the first field of Scalar, ScalarPair, and Vector layouts. | ||
self.llval | ||
} | ||
Abi::ScalarPair(..) => { | ||
// FIXME(nikic): Generate this for all ABIs. | ||
bx.inbounds_gep(bx.type_i8(), self.llval, &[bx.const_usize(offset.bytes())]) | ||
} | ||
Abi::Scalar(_) | Abi::Vector { .. } if field.is_zst() => { | ||
// ZST fields (even some that require alignment) are not included in Scalar, | ||
// ScalarPair, and Vector layouts, so manually offset the pointer. | ||
bx.gep(bx.cx().type_i8(), self.llval, &[bx.const_usize(offset.bytes())]) | ||
} | ||
Abi::Scalar(_) => { | ||
// All fields of Scalar layouts must have been handled by this point. | ||
// Vector layouts have additional fields for each element of the vector, so don't panic in that case. | ||
bug!( | ||
"offset of non-ZST field `{:?}` does not match layout `{:#?}`", | ||
field, | ||
self.layout | ||
); | ||
} | ||
_ => { | ||
let ty = bx.backend_type(self.layout); | ||
bx.struct_gep(ty, self.llval, bx.cx().backend_field_index(self.layout, ix)) | ||
} | ||
let llval = if offset.bytes() == 0 { | ||
self.llval | ||
} else { | ||
bx.inbounds_ptradd(self.llval, bx.const_usize(offset.bytes())) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code used non-inbounds GEP for ZSTs. But it's okay for inbounds to point one-past-the end:
The base pointer has an in bounds address of an allocated object, which means that it points into an allocated object, or to its end.
...which handles the ZST being "outside" of e.g. struct Foo(u64, ())
(where it is at an offset of 8 bytes), so I think using inbounds for ZST fields is okay. Unless ZSTs can be arbitrarily far outside the bounds of a layout? (@oli-obk?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Miri doesn't catch it yet: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=682be508e94b238472eccdb9e1fdef3b
And it seems to be an open question on whether it is ok or not: rust-lang/unsafe-code-guidelines#93
Tho looks like it's leaning towards it being ok I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the initial field offset in that example (x.1
) will go through this code, so I think it would be fine...but I'll change it back to non-inbounds for ZSTs anyways, so that the only functional change in this PR is the GEP i8/ptradd change, and since ZST field offsets should never be performance critical, so it doesn't really matter.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Always generate GEP i8 / ptradd for struct offsets This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely. Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change. Split out from rust-lang#121577. r? `@nikic`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (85edff9): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 649.3s -> 644.43s (-0.75%) |
We did not encounter any significant issues when we started canonicalizing these upstream, so I think it's fairly safe to do this step now. We could wait until the LLVM 19 release, but it doesn't really seem necessary. @bors r+ rollup=never |
This PR is at the top of the bors queue but not being processed... |
@bors r+ |
Finished benchmarking commit (70aa0b8): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 649.87s -> 644.02s (-0.90%) |
Are the binary size increases expected? |
Always generate GEP i8 / ptradd for struct offsets This implements rust-lang#98615, and goes a bit further to remove `struct_gep` entirely. Upstream LLVM is in the beginning stages of [migrating to `ptradd`](https://discourse.llvm.org/t/rfc-replacing-getelementptr-with-ptradd/68699). LLVM 19 will [canonicalize](llvm/llvm-project#68882) all constant-offset GEPs to i8, which has roughly the same effect as this change. Fixes rust-lang#121719. Split out from rust-lang#121577. r? `@nikic`
Use GEP inbounds for ZST and DST field offsets ZST field offsets have been non-`inbounds` since I made [this old layout change](https://github.com/rust-lang/rust/pull/73453/files#diff-160634de1c336f2cf325ff95b312777326f1ab29fec9b9b21d5ee9aae215ecf5). Before that, they would have been `inbounds` due to using `struct_gep`. Using `inbounds` for ZSTs likely doesn't matter for performance, but I'd like to remove the special case. DST field offsets have been non-`inbounds` since the alignment-aware DST field offset computation was first [implemented](erikdesjardins@a2557d4#diff-04fd352da30ca186fe0bb71cc81a503d1eb8a02ca17a3769e1b95981cd20964aR1188) in 1.6 (back then `GEPi()` would be used for `inbounds`), but I don't think there was any reason for it. Split out from rust-lang#121577 / rust-lang#121665. r? `@oli-obk` cc `@RalfJung` -- is there some weird situation where field offsets can't be `inbounds`? Note that it's fine for `inbounds` offsets to be one-past-the-end, so it's okay even if there's a ZST as the last field in the layout: > The base pointer has an in bounds address of an allocated object, which means that it points into an allocated object, or to its end. [(link)](https://llvm.org/docs/LangRef.html#getelementptr-instruction) For rust-lang/unsafe-code-guidelines#93, zero-offset GEP is (now) always `inbounds`: > Note that getelementptr with all-zero indices is always considered to be inbounds, even if the base pointer does not point to an allocated object. [(link)](https://llvm.org/docs/LangRef.html#getelementptr-instruction)
Use GEP inbounds for ZST and DST field offsets ZST field offsets have been non-`inbounds` since I made [this old layout change](https://github.com/rust-lang/rust/pull/73453/files#diff-160634de1c336f2cf325ff95b312777326f1ab29fec9b9b21d5ee9aae215ecf5). Before that, they would have been `inbounds` due to using `struct_gep`. Using `inbounds` for ZSTs likely doesn't matter for performance, but I'd like to remove the special case. DST field offsets have been non-`inbounds` since the alignment-aware DST field offset computation was first [implemented](erikdesjardins@a2557d4#diff-04fd352da30ca186fe0bb71cc81a503d1eb8a02ca17a3769e1b95981cd20964aR1188) in 1.6 (back then `GEPi()` would be used for `inbounds`), but I don't think there was any reason for it. Split out from rust-lang#121577 / rust-lang#121665. r? `@oli-obk` cc `@RalfJung` -- is there some weird situation where field offsets can't be `inbounds`? Note that it's fine for `inbounds` offsets to be one-past-the-end, so it's okay even if there's a ZST as the last field in the layout: > The base pointer has an in bounds address of an allocated object, which means that it points into an allocated object, or to its end. [(link)](https://llvm.org/docs/LangRef.html#getelementptr-instruction) For rust-lang/unsafe-code-guidelines#93, zero-offset GEP is (now) always `inbounds`: > Note that getelementptr with all-zero indices is always considered to be inbounds, even if the base pointer does not point to an allocated object. [(link)](https://llvm.org/docs/LangRef.html#getelementptr-instruction)
The regression is all in backtrace-related code:
( (Generated with this ad-hoc script. There's probably some existing way to do this though, isn't there...) It's kind of interesting that the regression is spread across a bunch of different functions but all in backtrace-related code. Although I guess it's just because one function was optimized differently and the butterfly effect cascaded to everything else in its call graph. Looking at So I don't really see anything actionable here. It does make me wonder if we should be compiling backtrace-related crates with |
$ cargo run --bin collector binary_stats --symbols <rustc> But it will show slightly different stuff than your script. |
Add #[inline] to Option::copied/cloned In rust-lang#121665 (comment), I noticed that `Option::cloned` stopped being inlined in some backtrace code. Let's see if this helps. r? `@ghost`
Runtime performance isn't zero concern for backtrace code, oddly enough (mainly for the "forced capture" case), but backtrace-rs contains algorithmic optimizations which already make the appropriate trades, so yes, if you can get a significantly smaller code size by optimizing in that direction, that would be good. We can also look at working with the addr2line and gimli maintainers to reduce the code size a bit, if we can identify the heaviest-hitters in those crates. |
Use ptradd for vtable indexing Extension of rust-lang#121665. After this, the only remaining usages of GEP are [this](https://github.com/rust-lang/rust/blob/cd81f5b27ee00b49d413db50b5e6af871cebcf23/compiler/rustc_codegen_llvm/src/intrinsic.rs#L909-L920) kinda janky Emscription EH code, which I'll change in a future PR, and array indexing / pointer offsets, where there isn't yet a canonical `ptradd` form. (Out of curiosity I tried converting the latter to `ptradd(ptr, mul(size, index))`, but that causes codegen regressions right now.) r? `@nikic`
…=davidtwco Remove the unused `field_remapping` field from `TypeLowering` The `field_remapping` field of `TypeLowering` has been unused since rust-lang#121665. This PR removes it, then replaces the `TypeLowering` struct with its only remaining member `&'ll Type`.
…=davidtwco Remove the unused `field_remapping` field from `TypeLowering` The `field_remapping` field of `TypeLowering` has been unused since rust-lang#121665. This PR removes it, then replaces the `TypeLowering` struct with its only remaining member `&'ll Type`.
Use ptradd for vtable indexing Extension of rust-lang#121665. After this, the only remaining usages of GEP are [this](https://github.com/rust-lang/rust/blob/cd81f5b27ee00b49d413db50b5e6af871cebcf23/compiler/rustc_codegen_llvm/src/intrinsic.rs#L909-L920) kinda janky Emscription EH code, which I'll change in a future PR, and array indexing / pointer offsets, where there isn't yet a canonical `ptradd` form. (Out of curiosity I tried converting the latter to `ptradd(ptr, mul(size, index))`, but that causes codegen regressions right now.) r? `@nikic`
Rollup merge of rust-lang#122320 - erikdesjardins:vtable, r=nikic Use ptradd for vtable indexing Extension of rust-lang#121665. After this, the only remaining usages of GEP are [this](https://github.com/rust-lang/rust/blob/cd81f5b27ee00b49d413db50b5e6af871cebcf23/compiler/rustc_codegen_llvm/src/intrinsic.rs#L909-L920) kinda janky Emscription EH code, which I'll change in a future PR, and array indexing / pointer offsets, where there isn't yet a canonical `ptradd` form. (Out of curiosity I tried converting the latter to `ptradd(ptr, mul(size, index))`, but that causes codegen regressions right now.) r? `@nikic`
Rollup merge of rust-lang#122166 - beetrees:remove-field-remapping, r=davidtwco Remove the unused `field_remapping` field from `TypeLowering` The `field_remapping` field of `TypeLowering` has been unused since rust-lang#121665. This PR removes it, then replaces the `TypeLowering` struct with its only remaining member `&'ll Type`.
This implements #98615, and goes a bit further to remove
struct_gep
entirely.Upstream LLVM is in the beginning stages of migrating to
ptradd
. LLVM 19 will canonicalize all constant-offset GEPs to i8, which has roughly the same effect as this change.Fixes #121719.
Split out from #121577.
r? @nikic