-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use undef for uninitialized bytes in constants #83698
Conversation
r? @estebank (rust-highfive has picked a reviewer for you, use r? to override) |
r? @oli-obk |
cc @rust-lang/wg-llvm we're now marking undef bytes from constants as undef, just like we mark padding in constructed values. |
@bors r+ |
📌 Commit be66c2e has been approved by |
Use undef for uninitialized bytes in constants Fixes rust-lang#83657 This generates good code when the const is fully uninit, e.g. ```rust #[no_mangle] pub const fn fully_uninit() -> MaybeUninit<[u8; 10]> { const M: MaybeUninit<[u8; 10]> = MaybeUninit::uninit(); M } ``` generates ```asm fully_uninit: ret ``` as you would expect. There is no improvement, however, when it's partially uninit, e.g. ```rust pub struct PartiallyUninit { x: u64, y: MaybeUninit<[u8; 10]> } #[no_mangle] pub const fn partially_uninit() -> PartiallyUninit { const X: PartiallyUninit = PartiallyUninit { x: 0xdeadbeefcafe, y: MaybeUninit::uninit() }; X } ``` generates ```asm partially_uninit: mov rax, rdi mov rcx, qword ptr [rip + .L__unnamed_1+16] mov qword ptr [rdi + 16], rcx movups xmm0, xmmword ptr [rip + .L__unnamed_1] movups xmmword ptr [rdi], xmm0 ret .L__unnamed_1: .asciz "\376\312\357\276\255\336\000" .zero 16 .size .L__unnamed_1, 24 ``` which copies a bunch of zeros in place of the undef bytes, the same as before this change. The constant is undef where it should be, though (see the added test) so this seems to be a missed optimization in LLVM.
I don't think this should be rolled up, it could affect perf or expose existing UB (because previously reading uninitialized bytes from constants at runtime would "work") |
Use undef for uninitialized bytes in constants Fixes rust-lang#83657 This generates good code when the const is fully uninit, e.g. ```rust #[no_mangle] pub const fn fully_uninit() -> MaybeUninit<[u8; 10]> { const M: MaybeUninit<[u8; 10]> = MaybeUninit::uninit(); M } ``` generates ```asm fully_uninit: ret ``` as you would expect. There is no improvement, however, when it's partially uninit, e.g. ```rust pub struct PartiallyUninit { x: u64, y: MaybeUninit<[u8; 10]> } #[no_mangle] pub const fn partially_uninit() -> PartiallyUninit { const X: PartiallyUninit = PartiallyUninit { x: 0xdeadbeefcafe, y: MaybeUninit::uninit() }; X } ``` generates ```asm partially_uninit: mov rax, rdi mov rcx, qword ptr [rip + .L__unnamed_1+16] mov qword ptr [rdi + 16], rcx movups xmm0, xmmword ptr [rip + .L__unnamed_1] movups xmmword ptr [rdi], xmm0 ret .L__unnamed_1: .asciz "\376\312\357\276\255\336\000" .zero 16 .size .L__unnamed_1, 24 ``` which copies a bunch of zeros in place of the undef bytes, the same as before this change. The constant is undef where it should be, though (see the added test) so this seems to be a missed optimization in LLVM.
@bors rollup=never @Dylan-DPC might be better to re-roll that rollup... sorry for this. |
⌛ Testing commit be66c2e with merge 24f957fcbe9f9432f0c74cc707a5f008268f4984... |
This comment has been minimized.
This comment has been minimized.
💔 Test failed - checks-actions |
When building this test locally, foo.wasm is never below 1024 bytes (which is strange), but there is a regression: on master foo.wasm is 6k, and with these changes it's 140k (!). It seems to be defeating some optimization in LLVM, so we're no longer able to eliminate a bunch of formatting code. I'm trying to figure out why. It may be worth a perf run to see if this change as-is causes widespread regressions (I suspect it will), or if that wasm test is just fragile for some reason |
Now that you mention this, In my memory llvm would fail to put constants
with uninit in them into bss sections in the past. Does wasm have something
similar?
…On Tue, 6 Apr 2021, 06:34 erikdesjardins, ***@***.***> wrote:
wc -c < /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/wasm-panic-small/wasm-panic-small/foo.wasm
1139
[ "`wc -c < /checkout/obj/build/x86_64-unknown-linux-gnu/test/run-make/wasm-panic-small/wasm-panic-small/foo.wasm`" -lt "1024" ]
When building this test locally, it's never below 1024 bytes (which is
strange), but there is a regression: at HEAD~2 foo.wasm is 6k, and with
these changes it's 140k (!). It seems to be defeating some optimization in
LLVM, so we're no longer able to eliminate a bunch of formatting code. I'm
trying to figure out why.
It may be worth a perf run to see if this change as-is causes widespread
regressions (I suspect it will), or if that wasm test is just fragile for
some reason
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#83698 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFFZUWSLBZABFD6GSH6RF3THJ6MNANCNFSM42DX4N6A>
.
|
@bors try @rust-timer queue |
d4ab66d
to
adf3b01
Compare
@bors r=RalfJung,oli-obk |
📌 Commit adf3b01 has been approved by |
☀️ Test successful - checks-actions |
While the perf results show improvements in instructions counts this looks like a significant (non-noise) regression in wall times. The jump is noticable in the timelines of Labeling as regression so that the perf team can have a look at it. |
the await call tree opt change is in #88308 not in this PR there are a few other noticeable regressions in the timelines though, yes. Most of the non-incremental regressions I looked at seem to point to LLVM spending more time, so this may either end up with better optimized code or just make it harder for LLVM to figure things out. |
|
I tried to reproduce this locally but was unable to. I don't see a regression in wall-time, nor when running under cachegrind. Cachegrind actually shows a slight improvement for (For posterity, using e.g. So, I decided to look at the timeline now that more time has passed and...it seems kinda weird. Zoomed out, it's not that unusual. This PR introduced a regression and then a later change fixed it--right? But if we zoom in on the point where wall-time returns to normal, it doesn't make sense: First, it bounces between slow and fast before settling on fast. Which seems implausible--it's not likely that we introduced a perf improvement, then cancelled it out in the next commit, then reintroduced it three commits later. Second, the two commits responsible for those two improvements don't make sense:
(All I'm not sure what to make of this. Some intermittent issue with the perf collector that makes wall-time measurements longer? Actually, while typing this up, I think I figured out what it is. See this comment (on [B] 9dd4ce8) about the cpu governor being reset to
#88308 was merged immediately after this PR. It too had some (unexplainable) wall-time regressions, in Similarly, if you look at the timeline for those benchmarks: And zoom in: The improvements are due to [A] 38e5764 and [C] a58db2e (merged right after [B] 9dd4ce8). So I think this is what happened:
...time passes...
And judging from the timestamp on this comment, [A] 38e5764 was missed initially and manually re-enqueued later. This would explain the downward spike that appears everywhere--although it was committed earlier, its perf run occurred later, after the governor was fixed. * The regressions attributed to this PR ( |
That seems like a reasonable explanation, marking as triaged. |
Use undef for (some) partially-uninit constants There needs to be some limit to avoid perf regressions on large arrays with undef in each element (see comment in the code). Fixes: rust-lang#84565 Original PR: rust-lang#83698 Depends on LLVM 14: rust-lang#93577
Fixes #83657
This generates good code when the const is fully uninit, e.g.
generates
as you would expect.
There is no improvement, however, when it's partially uninit, e.g.
generates
which copies a bunch of zeros in place of the undef bytes, the same as before this change.
Edit: generating partially-undef constants isn't viable at the moment anyways due to #84565, so it's disabled