Idea: guaranteed-zero padding for niches #70230

RalfJung · 2020-03-21T15:33:53Z

As suggested here, one option to gain more layout optimizations would be (either for all repr(Rust) types or with per-type opt-in) to use a different kind of padding.

Right now, padding bytes may contain arbitrary data, and can be omitted when copying as they are "unstable" in a typed copy. Instead, we could also imagine a kind of padding ("zero-padding") that is guaranteed to be zero. This has the disadvantage that padding bytes need to be initialized on struct creation and copied when the struct is copied, but it has the advantage that padding bytes can become "niches" for layout optimizations.

LLVM's padding is of the "unstable" kind, but we could just add explicit fields ourselves (fields that are guaranteed to be zero) when translating zero-padded types to LLVM.

Cc @eddyb

The text was updated successfully, but these errors were encountered:

eddyb · 2020-03-21T17:31:56Z

LLVM's padding is of the "unstable" kind, but we could just add explicit fields ourselves (fields that are guaranteed to be zero) when translating zero-padded types to LLVM.

LLVM has no concept of "padding" outside of "first-class aggregates", which we only use for multiple (well, 2) return values (via ScalarPair), and nothing else (no values in functions, no global initializers). And really, it's basically like loading/storing the fields individually.

As long a type is Abi::Aggregate, and it's not copied by explicit field-by-field accesses, LLVM only sees a memcpy (so size and alignment), nothing else.

EDIT: oh and there's passing values around in calls - that's also not LLVM, it's us, we'd have to make sure we don't apply architecture-specific call ABI rules (for e.g. extern "C" fns) that introspect the field structure of the type, and treat it like an opaque blob instead.

RalfJung · 2020-03-21T18:01:04Z

LLVM has no concept of "padding" outside of "first-class aggregates", which we only use for multiple (well, 2) return values (via ScalarPair), and nothing else (no values in functions, no global initializers). And really, it's basically like loading/storing the fields individually.

I don't know what you mean. When I define a struct/union in LLVM, won't it add padding in between the fields?

LLVM is permitted to turn a copy of struct-type into a field-by-field copy, and AFAIK we had examples where it did that? This is the kind of transformation that becomes illegal with "zero-padding" (unless we somehow already know that the target memory is zero, e.g. because it is already valid at the given type).

bjorn3 · 2020-03-21T18:39:10Z

What would be the overhead of the actual zeroing of the padding? Especially for cg_clif, which barely performs any memory optimizations I imagine this will have a significant overhead. The only memory optimization performed by cg_clif results in a ~15% improvement. However that optimization is currently so dumb that I think zeroing padding will prevent performing this optimization. For cg_llvm it will just result in more ir to churn through with optimizations (=compiletime slowdown), or a runtime slowdown without optimizations.

RalfJung · 2020-03-21T18:42:21Z

Assuming the backend does naive copying (i.e., it always copies the entire struct and doesn't try to be clever by only copying the fields and ignoring padding), then the only overhead is in the constructor: Struct { f1, f2 } would have to compile to code that also writes zeroes to padding between and after the fields.

bjorn3 · 2020-03-21T19:11:11Z

Assuming the backend does naive copying (i.e., it always copies the entire struct and doesn't try to be clever by only copying the fields and ignoring padding)

Yes

then the only overhead is in the constructor

That is exactly what I am afraid of having a >5% overhead on compile time and, when not using optimizations, runtime. Especially for enums it may result in a lot more stores being generated.

bjorn3 · 2020-03-21T19:15:36Z

Also for the mir returned by instance_mir, the constructor is desugared into individual field assignments. This means that you either have to zero the full local in the prelude, or you have to do a complex analysis to find which bytes will be overwritten later. For the later you can't just take the field offsets, as different enum variants may have different padding.

RalfJung · 2020-03-21T19:48:21Z

I was thinking of struct padding only so far, where it should be pretty trivial to write some zeroes between the fields.

But I also don't see the issue with enums. In a constructor, the enum variant is statically known. So you just look at the fields of the initial variant, and zero the gaps between them.

That is exactly what I am afraid of having a >5% overhead on compile time and, when not using optimizations, runtime.

That sounds like a wild guess to me -- do you have any foundation for it being 5% rather than, say, 0.5%? I'd expect the impact to be tiny, since most structs will have way more data bytes than padding bytes and most code is not running constructors.

bjorn3 · 2020-03-21T20:04:36Z

But I also don't see the issue with enums. In a constructor, the enum variant is statically known. So you just look at the fields of the initial variant, and zero the gaps between them.

I guess zeroing during the codegen of SetDiscriminator will work. That will unnecessarily duplicate zeroing in some cases though.

That sounds like a wild guess to me -- do you have any foundation for it being 5% rather than, say, 0.5%? I'd expect the impact to be tiny, since most structs will have way more data bytes than padding bytes and most code is not running constructors.

Not much, but given the huge speed win of the very very dumb memory optimization I implemented (bail out on address taken, more than one store possibly preceding a load, ...) I expect the cost to be relatively high. I can try to benchmark it for cg_clif though.

upsuper · 2020-03-22T00:37:40Z

Assuming the backend does naive copying (i.e., it always copies the entire struct and doesn't try to be clever by only copying the fields and ignoring padding), then the only overhead is in the constructor: Struct { f1, f2 } would have to compile to code that also writes zeroes to padding between and after the fields.

Another potential optimization is that if reading the padding is undefined, the compiler might be able to utilize it to vectorize some writing even if such writing would overwrite the padding area, but if pading is defined to be zero then such optimization may be less feasible.

Not sure whether that's something really happening, though.

eddyb · 2020-03-22T00:59:58Z

I don't know what you mean. When I define a struct/union in LLVM, won't it add padding in between the fields?

Only if you use those types as "first class aggregates", as in, by-value.
Oh and Abi::Scalar should be fine since we use it only when there is no padding after the scalar, so only Abi::ScalarPair and Abi::Vector lose padding (outside of non-Rust call ABIs).

When used for field access only, like we do, it's just a silly way to specify a field offset, nothing more.

llvm.memcpy calls don't see a type, they just know how many bytes to copy, and we pass the full size (aka "stride"), not the smaller unpadded size, anyway (mostly because we don't track the latter).

Also, if you look at this example's LLVM IR, this is how we lower non-unions:

#[repr(C)] // To avoid field reorder.
pub struct S(i16, i32, i64);

%S = type { [0 x i16], i16, [1 x i16], i32, [0 x i32], i64, [0 x i64] }

Note how we've made the padding fields explicit, and fill it in even when LLVM would compute the right offsets without it anyway? Well, it's because LLVM has no way to specify a custom alignment.

And without tracking "alignment from LLVM leaves" for every layout, we can't easily know in the general case whether LLVM will compute the correct offset or not, so we always force it with explicit padding.

Anyway, all that "padding" is just to make LLVM compute the right offset numbers, we should seriously consider just using one zero-length array to force an alignment and one byte array for offsets.

@RalfJung I suppose the only way LLVM knows something like S above has padding, especially in the local stack frame, is that some byte ranges are never written to.

If they were, it wouldn't matter how you expressed the offsets, and unless used by value (which you shouldn't do because support is dismal) LLVM struct (tuple) "types" are a red herring.

RalfJung · 2020-03-22T11:59:08Z

Note how we've made the padding fields explicit, and fill it in even when LLVM would compute the right offsets without it anyway? Well, it's because LLVM has no way to specify a custom alignment.

Oh I see, so LLVM itself couldn't even do the "replace copy of struct by copy of fields skipping padding" transformation, because it does not know which fields are padding. Nothing would have to change here for guaranteed-zero padding.

Btw, with nightly features we can already "manually" zero-pad a struct, and layout optimizations can exploit that:

#![feature(rustc_attrs)]

#[rustc_layout_scalar_valid_range_start(0)]
#[rustc_layout_scalar_valid_range_end(0)]
struct ZeroPad(u8);

impl ZeroPad {
    const ZERO: ZeroPad = unsafe { ZeroPad(0) };
}

struct Test(u8, ZeroPad, u16);
fn main() {
    println!("{}", std::mem::size_of::<Option<Test>>());
}

Playground

RustyYato · 2020-03-22T21:27:29Z

@RalfJung We can get even simpler, just use an enum!

#[repr(u8)]
enum ZeroPad {
    Zero = 0
}

RalfJung · 2020-03-22T21:45:20Z

@KrishnaSannasi indeed, that works just as well. Nice catch :)

bjorn3 · 2020-03-23T13:34:02Z

I am currently trying to implement zero padding on SetDiscriminator and I already got a miscompilation for FieldPlacement::Arbitrary. This suggests that it is non-trivial to write code to find the padding.

RalfJung · 2020-03-23T13:45:42Z

It should be easier for structs though, right? As mentioned before, I mostly had structs in mind, where padding is much simpler than for enums.

It may make more sense to say that Deaggregator actually has to respect zero-padding in its lowering -- with per-field initialization, the necessary information is already mostly gone and would have to be carefully reconstructed.

bjorn3 · 2020-03-23T13:48:45Z

The enum variant is as hard as the struct variant if you take the route of zeroing during SetDiscriminant. I just used layout.for_variant().

RalfJung · 2020-03-23T13:51:25Z

Not struct variants. Structs. As in struct Foo { field1: type1, field2: type2 }.

bjorn3 · 2020-03-23T13:54:25Z

Yes, I meant structs. "struct variant" as in the variant of zeroing codegen for structs. I can understand the confusion though. By the way, turns out I was zeroing scalars as they get FieldPlacement::Arbitrary without fields.

RalfJung · 2020-03-23T14:07:35Z

But there is no SetDiscriminant for structs, is there? There is none here.

RalfJung · 2020-03-23T14:08:49Z

Also, for your LayoutDetails troubles, this could be helpful: #69901

bjorn3 · 2020-03-23T14:18:18Z

I thought there was a SetDiscriminant to finalize the construction of every aggregate, even for structs. Something with partial initialization and borrowck. I guess I will try to zero during the function prologue instead.

RalfJung · 2020-03-23T14:19:50Z

I don't think this is a task for codegen. MIR has to make it clear when zeroing is necessary. That is the case when Aggregate initialization is used, but the Deaggregate path destroys that information.

eddyb · 2020-03-23T14:35:14Z

Part of the plan to get rid of the Deaggregate pass, btw, was to indicate full initialization with SetDiscriminant (which could be renamed to FinalizeVariant or something).

You could make Deaggregate keep SetDiscriminant to be able to access it in codegen.

bjorn3 · 2020-03-23T16:41:21Z

The cost of zeroing the padding seems to be less than I expected. Both during SetDiscriminant (enum only) and during the prologue (adt only) it doesn't seem to have a negative impact. In fact it seems to slightly improve performance. I don't know if this just my computer warming up (the benches with padding happened to be sorted later alphabetically) or if it is caused by for example the store to load forwarding of the CPU. More benchmarking with the final implementation method will likely be necessary.

Zeroing code for cg_clif

let variant_layout = place.layout().for_variant(fx, variant_idx); // Remove `for_variant` when zeroing during the prologue
let mut padding_ranges = vec![];
match &variant_layout.fields {
    layout::FieldPlacement::Union(field_count) => {
        padding_ranges.push(
            (0..*field_count).map(|field| variant_layout.field(fx, field).size).max().unwrap_or(Size::ZERO)
            ..
            place.layout().size,
        );
    }
    layout::FieldPlacement::Array { stride, count } => {
        for field in 0..*count as usize {
            padding_ranges.push(
                variant_layout.fields.offset(field) + variant_layout.field(fx, field).size
                ..
                variant_layout.fields.offset(field) + *stride
            );
        }
    }
    layout::FieldPlacement::Arbitrary { .. } => {
        let mut last_field = None;
        for field in variant_layout.fields.index_by_increasing_offset() {
            if let Some(last_field) = last_field {
                padding_ranges.push(
                    variant_layout.fields.offset(last_field) + variant_layout.field(fx, last_field).size
                    ..
                    variant_layout.fields.offset(field)
                );
            }
            last_field = Some(field);
        }
        if let Some(last_field) = last_field {
            padding_ranges.push(
                variant_layout.fields.offset(last_field) + variant_layout.field(fx, last_field).size
                ..
                variant_layout.size
            );
        } else {
            //padding_ranges.push(Size::ZERO..variant_layout.size);
        }
    }
}

if let ty::Adt(adt_def, _) = place.layout().ty.kind {
    //if adt_def.is_struct() {
    //} else {
    //    padding_ranges.clear();
    //}
} else {
    padding_ranges.clear();
}

for padding_range in padding_ranges {
    let addr = place.to_ptr(fx).offset_i64(fx, i64::try_from(padding_range.start.bytes()).unwrap()).get_addr(fx);
    let size = padding_range.end.bytes() - padding_range.start.bytes();
    fx.bcx.emit_small_memset(
        fx.module.target_config(),
        addr,
        0,
        size,
        std::cmp::min(/*greatest_divisible_power_of_two*/(size as i64 & -(size as i64)) as u64, 8u64) as u8, /*FIXME*/
    );
}

bjorn3 · 2020-03-23T17:04:53Z

Full benchmark details:

[BENCH RUN] ebobby/simple-raytracer
Benchmark #1: ./raytracer_cg_clif_no_zero_pad
  Time (mean ± σ):     13.009 s ±  0.853 s    [User: 12.964 s, System: 0.016 s]
  Range (min … max):   12.061 s … 14.689 s    10 runs
 
Benchmark #2: ./raytracer_cg_clif_release
  Time (mean ± σ):     10.194 s ±  0.454 s    [User: 10.162 s, System: 0.012 s]
  Range (min … max):    9.523 s … 11.052 s    10 runs
 
Benchmark #3: ./raytracer_cg_clif_zero_pad
  Time (mean ± σ):     12.320 s ±  0.416 s    [User: 12.302 s, System: 0.006 s]
  Range (min … max):   11.850 s … 13.185 s    10 runs
 
Benchmark #4: ./raytracer_cg_clif_zero_pad_arbitrary
  Time (mean ± σ):     12.618 s ±  0.684 s    [User: 12.583 s, System: 0.011 s]
  Range (min … max):   11.933 s … 13.932 s    10 runs
 
Benchmark #5: ./raytracer_cg_clif_zero_pad_prologue
  Time (mean ± σ):     11.997 s ±  0.076 s    [User: 11.976 s, System: 0.006 s]
  Range (min … max):   11.906 s … 12.155 s    10 runs
 
Benchmark #6: ./raytracer_cg_clif_zero_pad_prologue_enum_too
  Time (mean ± σ):     12.205 s ±  0.152 s    [User: 12.192 s, System: 0.006 s]
  Range (min … max):   12.096 s … 12.574 s    10 runs
 
Benchmark #7: ./raytracer_cg_clif_zero_pad_prologue_enum_too_release
  Time (mean ± σ):      9.440 s ±  0.081 s    [User: 9.425 s, System: 0.007 s]
  Range (min … max):    9.374 s …  9.656 s    10 runs
 
Summary
  './raytracer_cg_clif_zero_pad_prologue_enum_too_release' ran
    1.08 ± 0.05 times faster than './raytracer_cg_clif_release'
    1.27 ± 0.01 times faster than './raytracer_cg_clif_zero_pad_prologue'
    1.29 ± 0.02 times faster than './raytracer_cg_clif_zero_pad_prologue_enum_too'
    1.31 ± 0.05 times faster than './raytracer_cg_clif_zero_pad'
    1.34 ± 0.07 times faster than './raytracer_cg_clif_zero_pad_arbitrary'
    1.38 ± 0.09 times faster than './raytracer_cg_clif_no_zero_pad'

arbitrary => For the SetDiscriminant version: Don't ignore padding for FieldPlacement::Arbitrary.
release => Run the memory optimization.
prologue => Zero in the function prologue instead of during SetDiscrimintant.
enum_too => For the prologue version: Don't ignore enums.

workingjubilee · 2020-09-05T05:42:39Z

Possibly relevant prior art, approaching guarantees of zero-initialized padding from a security perspective, and avoiding various accidents that might occur due to zero-initialized padding:
Solving Uninitialized Stack Memory on Windows

atagunov · 2021-06-28T12:42:58Z

In fact it seems to slightly improve performance

Could it be

writing a partial cache line requires CPU to first fetch that line from RAM + write it back
writing a full cache line requires CPU to do writing only?

cuviper · 2021-09-24T21:19:42Z

Struct { f1, f2 } would have to compile to code that also writes zeroes to padding between and after the fields.

What about MaybeUninit<Struct>, does that also need to be created with zeroed padding?
If I write f1, then f2, then call any of the assume_init* methods, I expect to be in a good state.

edit: @comex also mentioned this in rust-lang/unsafe-code-guidelines#174 (comment).

RalfJung · 2021-09-24T22:05:08Z

If I write f1, then f2, then call any of the assume_init* methods, I expect to be in a good state.

This assumption would not be true for repr(GuaranteedZeroPadding).

Something has to give, we won't get more niches for free.

cuviper · 2021-09-24T22:13:53Z

This assumption would not be true for repr(GuaranteedZeroPadding).

I see, I guess that's what you meant here:

(either for all repr(Rust) types or with per-type opt-in)

So repr(Rust) probably isn't feasible, but a new repr could be the opt-in.

eddyb · 2021-09-30T16:40:57Z

Small nit: #[repr(C)] and #[repr(Rust)] naming styles are exceptions because they name languages, it should be #[repr(guaranteed_zero_padding)] instead.

But also I would expect something like #[repr(padding(zeroed))] or maybe even a more general form of #[repr(padding(fill_with(0)))].

RalfJung · 2021-09-30T18:47:19Z

I totally meant this syntax as a strawman. :)

jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 21, 2020

RalfJung mentioned this issue Mar 21, 2020

Padding Bytes Guarantees? rust-lang/unsafe-code-guidelines#174

Closed

jonas-schievink added the A-layout Area: Memory layout of types label Mar 29, 2020

ogoffart mentioned this issue Aug 24, 2020

Resurrect #70477: "Use the niche optimization if other variant are small enough" #75866

Closed

erikdesjardins mentioned this issue Dec 5, 2020

Padding to ensure a value is well-aligned is not used for niche value optimization #79531

Open

danobi mentioned this issue May 7, 2021

Fix clippy warnings libbpf/libbpf-rs#80

Merged

manuthambi mentioned this issue May 23, 2021

Soundness of AtomicCell::compare_exchange is dubious crossbeam-rs/crossbeam#315

Open

anuraaga mentioned this issue Jan 15, 2023

map delete not finding keys when value is largeish, key hash seems to be incorrect tinygo-org/tinygo#3358

Closed

cuviper mentioned this issue Mar 24, 2023

Missed optimization when putting structs with padding an enum #109555

Closed

workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023

cuviper mentioned this issue Nov 10, 2023

Sub-optimal layout of Option<(u16, [u8; 15])> (missed niche optimization) #117429

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: guaranteed-zero padding for niches #70230

Idea: guaranteed-zero padding for niches #70230

RalfJung commented Mar 21, 2020 •

edited

Loading

eddyb commented Mar 21, 2020 •

edited

Loading

RalfJung commented Mar 21, 2020

bjorn3 commented Mar 21, 2020 •

edited

Loading

RalfJung commented Mar 21, 2020 •

edited

Loading

bjorn3 commented Mar 21, 2020

bjorn3 commented Mar 21, 2020 •

edited

Loading

RalfJung commented Mar 21, 2020 •

edited

Loading

bjorn3 commented Mar 21, 2020

upsuper commented Mar 22, 2020

eddyb commented Mar 22, 2020 •

edited

Loading

RalfJung commented Mar 22, 2020

RustyYato commented Mar 22, 2020

RalfJung commented Mar 22, 2020

bjorn3 commented Mar 23, 2020

RalfJung commented Mar 23, 2020

bjorn3 commented Mar 23, 2020

RalfJung commented Mar 23, 2020 •

edited

Loading

bjorn3 commented Mar 23, 2020 •

edited

Loading

RalfJung commented Mar 23, 2020 •

edited

Loading

RalfJung commented Mar 23, 2020

bjorn3 commented Mar 23, 2020

RalfJung commented Mar 23, 2020 •

edited

Loading

eddyb commented Mar 23, 2020

bjorn3 commented Mar 23, 2020

bjorn3 commented Mar 23, 2020

workingjubilee commented Sep 5, 2020

atagunov commented Jun 28, 2021

cuviper commented Sep 24, 2021 •

edited

Loading

RalfJung commented Sep 24, 2021

cuviper commented Sep 24, 2021

eddyb commented Sep 30, 2021

RalfJung commented Sep 30, 2021

Idea: guaranteed-zero padding for niches #70230

Idea: guaranteed-zero padding for niches #70230

Comments

RalfJung commented Mar 21, 2020 • edited Loading

eddyb commented Mar 21, 2020 • edited Loading

RalfJung commented Mar 21, 2020

bjorn3 commented Mar 21, 2020 • edited Loading

RalfJung commented Mar 21, 2020 • edited Loading

bjorn3 commented Mar 21, 2020

bjorn3 commented Mar 21, 2020 • edited Loading

RalfJung commented Mar 21, 2020 • edited Loading

bjorn3 commented Mar 21, 2020

upsuper commented Mar 22, 2020

eddyb commented Mar 22, 2020 • edited Loading

RalfJung commented Mar 22, 2020

RustyYato commented Mar 22, 2020

RalfJung commented Mar 22, 2020

bjorn3 commented Mar 23, 2020

RalfJung commented Mar 23, 2020

bjorn3 commented Mar 23, 2020

RalfJung commented Mar 23, 2020 • edited Loading

bjorn3 commented Mar 23, 2020 • edited Loading

RalfJung commented Mar 23, 2020 • edited Loading

RalfJung commented Mar 23, 2020

bjorn3 commented Mar 23, 2020

RalfJung commented Mar 23, 2020 • edited Loading

eddyb commented Mar 23, 2020

bjorn3 commented Mar 23, 2020

bjorn3 commented Mar 23, 2020

workingjubilee commented Sep 5, 2020

atagunov commented Jun 28, 2021

cuviper commented Sep 24, 2021 • edited Loading

RalfJung commented Sep 24, 2021

cuviper commented Sep 24, 2021

eddyb commented Sep 30, 2021

RalfJung commented Sep 30, 2021

RalfJung commented Mar 21, 2020 •

edited

Loading

eddyb commented Mar 21, 2020 •

edited

Loading

bjorn3 commented Mar 21, 2020 •

edited

Loading

RalfJung commented Mar 21, 2020 •

edited

Loading

bjorn3 commented Mar 21, 2020 •

edited

Loading

RalfJung commented Mar 21, 2020 •

edited

Loading

eddyb commented Mar 22, 2020 •

edited

Loading

RalfJung commented Mar 23, 2020 •

edited

Loading

bjorn3 commented Mar 23, 2020 •

edited

Loading

RalfJung commented Mar 23, 2020 •

edited

Loading

RalfJung commented Mar 23, 2020 •

edited

Loading

cuviper commented Sep 24, 2021 •

edited

Loading