Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed enum layout optimization depending on enum variant field reordering #125630

Closed
cmrschwarz opened this issue May 27, 2024 · 2 comments · Fixed by #130508
Closed

Missed enum layout optimization depending on enum variant field reordering #125630

cmrschwarz opened this issue May 27, 2024 · 2 comments · Fixed by #130508
Labels
A-layout Area: Memory layout of types C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@cmrschwarz
Copy link
Contributor

rustc fails to optimize the enum Layout of V1 and V2 of the (real world) example type LazyRwLockGuard to 24 bytes.

In case of V1, this might be be due to #101567.

But the case of V2 seems to be a separate issue, as it is identical to V3 (which is optimized correctly), except for the ordering of the fields inside the Read enum variant.

(For clarity, RwLockReadGuard and RwLockWriteGuard each use 16 bytes and contain a niche).

use std::sync::*;

// could use 24 bytes, uses 32 (probably known issue, #101567)
pub enum LazyRwLockGuardV1<'a, T> {
    Unlocked(&'a RwLock<T>),
    Read {
        lock: &'a RwLock<T>,
        guard: RwLockReadGuard<'a, T>,
    },    
    Write(RwLockWriteGuard<'a, T>),
}

// helper subtype (16 Bytes) so the main type figures out it's Niche correctly
pub enum LazyRwLockWriteGuard<'a, T> {
    Unlocked(&'a RwLock<T>),
    Write(RwLockWriteGuard<'a, T>),
}

// this type should now be 24 bytes, but unfortunately still uses 32
pub enum LazyRwLockGuardV2<'a, T> {
    Read {
        lock: &'a RwLock<T>,
        guard: RwLockReadGuard<'a, T>,
    },    
    NonRead(LazyRwLockWriteGuard<'a, T>),
}

// this type correctly uses 24 bytes
pub enum LazyRwLockGuardV3<'a, T> {
    Read {
        guard: RwLockReadGuard<'a, T>,
        lock: &'a RwLock<T>, // fields reordered
    },    
    NonRead(LazyRwLockWriteGuard<'a, T>),
}

godbolt repro
stackoverflow question

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 27, 2024
@the8472
Copy link
Member

the8472 commented May 27, 2024

There are some field ordering heuristics that get applied to regular structs (#102750, #108106) but not to enum variants. If you extract the Read variant into a struct and make that struct into a variant payload it does work as desired.

Enums are under different constraints so the optimizations can't be ported 1:1, but with some tweaks it should be possible.

@the8472 the8472 added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-layout Area: Memory layout of types C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels May 27, 2024
@cmrschwarz
Copy link
Contributor Author

In case my practical example is a bit overcomplicated, here's a reduced version:

use std::num::NonZeroU64;

// uses 32 bytes, swapping x and y brings it down to 24
pub enum Foo {
    A {
        x: NonZeroU64,  
        y: [NonZeroU64; 2],
    },    
    B([u64; 2]),
}

godbolt repro

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 19, 2024
…, r=the8472

Get rid of niche selection's dependence on fields's order

Fixes rust-lang#125630.
Use the optimal niche selection decided in `univariant()` rather than picking niche field manually.

r? `@the8472`
@bors bors closed this as completed in 2b11f26 Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-layout Area: Memory layout of types C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants