Proposal: Integer-backed packed struct #5049

SpexGuy · 2020-04-15T16:30:59Z

So far, Zig doesn't have a good solution for flags/bitfields. Packed structs are the current recommended solution, and they provide a lot of improvements over the standard C approach of defining integer constants. But they are still not quite sufficient, even with #3133. For me, the requirements of a good flags/bitfield type are:

Deterministic packing
Can be cast to and from an integer type, to support bulk operations (&, |, etc)
Well-defined conversion between little-endian and big-endian
Guarantees the size of its loads and stores in a predictable way, to be compatible with MMIO registers
Behaves as an integer when used as a parameter to an extern function, so as to be compatible with a C ABI that uses flags

Requirements (1) and (2) are already satisfied by packed structs. Requirements (3) and (4) are not satisfied by packed structs yet, but are kind of a package deal. If you define load/store size you get deterministic endianness behavior. There have been ways suggested to do this, such as putting a u0 aligned to the required word size at the beginning of a packed struct. (I can't find that suggestion anymore, if anyone knows where it is please let me know and I'll link it here). But assuming packed structs will always behave like structs from an ABI standpoint, they may not be able to satisfy requirement (5) for all calling conventions. This can lead to some pretty ridiculous workarounds, that are not always feasible.

To solve this, I propose a variant of packed structs which are backed by a specified integer type, similar to enums backed by a specified integer type. Just like enums are integers at the ABI level, integer-backed packed structs are also integers at the ABI level. They are passed in registers across function call boundaries whenever their backing integer type would be, and they are allowed on extern boundaries only when their backing integer type would be. They can be used in atomic operations just like their integer type would be (with the exception of RMW operations like add, sub, etc, similar to enums). Their default alignment matches the alignment of their backing integer type.

My proposed syntax is this:

const ExampleFlags = packed struct(u32) {
	int_flag: u1 = 0,
	bool_flag: bool = false,
	small_value: u3 = 3,
	aligned_value: u8 align(1),

	const Self = @This();
	pub fn union(self: Self, other: Self) Self {
		return @bitCast(Self, @bitCast(u32, self) | @bitCast(u32, other));
	}
};

Integer operators (+, -, &, etc) are not defined for integer packed structs, but @bitCast is allowed to convert to the integer type if you really want to do bulk operations. We could also make a more specialized cast, analagous to @enumToInt. Like normal packed structs or enums, bindable functions can be specified in the struct namespace to facilitate bulk operations at the API level.

Fields are specified from the LSB of the backing integer to the MSB. This definition will always mean that endianness conversion for the flags field is the same as endianness conversion for the backing integer type. If the number of specified bits is fewer than the number available in the backing type, or if there are padding bits due to aligned fields, the value of the extra bits is undefined. If you require these values to be zeroed (e.g. for interop with a C api, MMIO, or sort ordering), you must explicitly add padding fields that are set to zero. If more bits are specified than are available in the backing type, that's a compile error.

Integer-backed packed structs are allowed to contain anything that a packed struct can contain and follows all of the layout rules of packed structs, as long as the contained elements can fit inside the specified number of bits. The backing type must be an actual integer. It cannot be another integer-backed struct or an extensible enum. When necessary, conversions between those types can instead be performed with @bitCast.

When an integer packed struct is nested into another integer packed struct, it behaves like its integer type and is bit-packed, unless an alignment is explicitly specified.

Writing the entire value of a packed struct is guaranteed to be implemented with the same semantics as writing to a value of the integer type. Writing to a field in an integer-backed packed struct is semantically a read-modify-write operation on the entire integer. Whenever the write size is semantically observable (e.g. through atomic or volatile operations), the size of the read and write instructions for an integer packed struct is guaranteed to be the same as that of the backing integer type. As always, if the read/write is not observable, the optimizer is free to do whatever it wants (load/store the smallest size it can, split into multiple stores, combine loads/stores together, ignore all of them and use a register, etc).

Consequently, a pointer to a field of a backing type has the type *align(<parent struct alignment>:<bit offset in integer>:<byte size of integer>) <field type>. So
&(ExampleFlags{}).small_value is of type *align(4:2:4) u3, and
&(ExampleFlags{}).aligned_value is of type *align(4:8:4) u8.

This proposal would provide an explicit solution for #1834, #4056, and #4185, without limiting the capabilities of the more general packed struct. With some extra work, it could also be applied to solve #1761 and #3472.

The text was updated successfully, but these errors were encountered:

Tetralux · 2020-04-17T03:52:18Z

The only thing with this is that if your packed struct contains several fields that are structs, or you just have a lot of fields, or indeed, you care more that's it's packed than the final size of it, you have to manually calculate the bits - which would be a bit of PITA.

SpexGuy · 2020-04-17T04:48:32Z

if your packed struct contains several fields that are structs, or you just have a lot of fields, or indeed, you care more that's it's packed than the final size of it

This proposal doesn't remove the current packed struct from the language, so those use cases should be handled by it. This is specifically for when you need guaranteed integer behavior, in order to be compatible with a C api, MMIO, or a particular byte swapping strategy.

Byte swapping a normal packed struct is a bridge we'll have to figure out how to cross eventually, since networking is one of its intended use cases. But the rules for that could be allowed to be quite complicated, since a standard packed struct can be much larger than a single integer. (note though, complicated does not necessarily imply slow in this case.)

ghost · 2020-04-17T13:31:50Z

@SpexGuy , I see that you referred to #4185. What are your thoughts on it?

I assume there is something you believe this proposal provides that #4185 doesn't?

SpexGuy · 2020-04-17T21:16:59Z

I think the enumflagset proposal also handles a similar set of use cases, so I think doing either one or the other makes sense. I first considered something like enumflagset, but there are a couple reasons why I prefer this solution:

Packed structs can contain multi-bit fields, e.g. a packed enum(u2). This is mostly useful for the MMIO use case but it is also sometimes a thing that C APIs do.
Giving a bit more control over an existing construct seems better than adding a new one. This proposal wouldn't need a new union member in @typeInfo(..), and code that uses reflection to inspect fields will "just work" in most cases. You want to make a UI that reflects over your fields and generates editable fields for all your values? Just make it work for packed structs and you're done. Flags aren't a special case.
I think the struct-like form is much cleaner than the bitmasking form in many cases:

// struct
foo: Flags = .{};
foo: Flags = .{ .x=true };
foo: Flags = .{ .x=true, .y=true, .z=bool_var };
foo.x = true;
foo.x = false;
foo.x = bool_var;
if (foo.x) { ... }
if (foo.x and foo.y) { ... }

// masking
foo: Flags = 0;
foo: Flags = .X;
foo: Flags = .X | .Y | (if (bool_var) .Z else 0);
foo |= .X;
foo &= ~.X;
foo = (foo & ~.X) | (if (bool_var) .X else 0);
if (foo & .X != 0) { ... }
if (foo & (.X | .Y) == (.X | .Y)) { ... }

I think it might also make sense to make a FlagsMixin class in the standard library that supplies functions for bulk operations (|, &, ~), because these are a bit cumbersome with integer packed structs and it might be better to have one standard set of function names that everyone uses instead of having every library that has a flags struct pick their own names.

markfirmware · 2020-05-07T01:15:57Z

In this project I'm only using volatile only on aligned word pointers (to u32 or to packed structs of 32 bits.)

https://github.com/markfirmware/zig-vector-table/blob/c69f04caf843d9b7977980a5230f3ced415003dc/main.zig#L38-L56

@vegecode

ghost · 2020-11-18T11:28:26Z

I agree with this, although I would make one change to the syntax:

const ExampleFlags = packed(u32) struct { // ...

because struct(u32) on its own does not make sense.

mikdusan · 2021-01-05T02:38:45Z

There have been ways suggested to do this, such as putting a u0 aligned to the required word size at the beginning of a packed struct. (I can't find that suggestion anymore, if anyone knows where it is please let me know and I'll link it here).

#4016 ?

N00byEdge · 2021-11-07T14:53:23Z

I would like it if each field could explicitly specify their bit offset (since that's 100% what you're trying to accomplish here), but I'm not sure how that syntax would look, but I'm certainly sure that it would look better than that align() syntax.

ifreund · 2022-01-14T07:58:26Z

If the number of specified bits is fewer than the number available in the backing type, or if there are padding bits due to aligned fields, the value of the extra bits is undefined.

Why don't we make this a compile error? What's the use-case?

SpexGuy · 2022-01-14T15:13:34Z

Good point, I agree.

gpanders · 2022-01-18T04:11:01Z

If the number of specified bits is fewer than the number available in the backing type, or if there are padding bits due to aligned fields, the value of the extra bits is undefined.

Why don't we make this a compile error? What's the use-case?

Does this mean the following would be a compile error?

const ProtFlags = packed struct(u32) {
    read: bool,
    write: bool,
    execute: bool,
};

I am only specifying 3 bits, but using a backing u32. This is a pattern I've used when interfacing with C code or binary file formats that use a larger integer type than is strictly necessary, either for alignment reasons or in order to reserve bits for future use.

Today, I just add an extra _: uX to the end, where X is whatever value is necessary to make the size of the struct match what is expected. This has to be calculated manually and it would be nice to be able to omit this, but it sounds like you are suggesting this should be a compile error.

SpexGuy · 2022-01-18T04:31:57Z

Yes, this should absolutely be a compile error. If you are interfacing with a C API where these bits are reserved, you need to set them to the reserved value (probably 0). Same if you are writing a binary file. Padding has undefined values, and nonstandard values in padding may produce an undefined value. So there must not be any undefined padding bits in a packed struct where the underlying representation is important.

gpanders · 2022-01-18T04:45:10Z

Would it be possible to infer the number of necessary bits to pad the struct to the correct size? Again using my previous example:

Proposed (status quo):

const ProtFlags = packed struct(u32) {
    read: bool,
    write: bool,
    execute: bool,
    _: u29 = 0,
};

I am already specifying u32 as the backing integer, so having to again specify u29 is redundant. If I add a field or update the backing integer size, I also have to update the number of padding bits. Maybe this is an example of the "strategic friction” that Zig aims for, but to me it feels repetitive.

If the compiler could infer the necessary number of padding bits then I could do something like this instead:

const ProtFlags = packed struct(u32) {
    read: bool,
    write: bool,
    execute: bool,
    _ = 0,
};

This way I can still initialize the unused/padding bits to avoid undefined values, but I also do not need to manually calculate the number needed.

praschke · 2022-06-17T10:42:39Z

I am already specifying u32 as the backing integer, so having to again specify u29 is redundant. If I add a field or update the backing integer size, I also have to update the number of padding bits. Maybe this is an example of the "strategic friction” that Zig aims for, but to me it feels repetitive.

this is only useful in the case of one padding region. as soon as you have more than one region you have to specify the size of all but one padding region. it also might complicate the syntax, though i don't know the details there. it strikes me as another moving part for a very minor convenience.

ominitay · 2022-06-17T14:54:34Z

@gpanders I know I'm bikeshedding, but here would be my suggestion for an approach:

Status Quo:

const ProtFlags = packed struct(u32) {
    read: bool,
    write: bool,
    execute: bool,
    _: u29 = 0,
};

Alternative:

const ProtFlags = packed struct(_) {
    read: bool,
    write: bool,
    execute: bool,
    _: u29 = 0,
};

packed struct (_) would instruct the compiler to infer the bit size of the backing integer from the packed struct's contents. There is never a case where this is ambiguous, so this shouldn't cause any problems. It would allow explicit backing integers to be used where appropriate (i.e. binary data formats, mmio), while allowing easy changes to packed structs in size where this restriction isn't necessary.

gpanders · 2022-06-17T15:06:28Z

The only problem with that is that the size of the struct is no longer obvious. In this example it’s easy enough to see, but in a more complex structure this would involve needing to add up the size of all of the individual fields.

as soon as you have more than one region you have to specify the size of all but one padding region

So? Just have the final _ be automatically inferred. Padding bits in the middle of the struct are significant as this corresponds to the underlying structure of the bit pattern or whatever, so it’s fine that their size has to be explicitly stated. The trailing padding bits are just there to ensure the size of the struct matches what the user specified.

ominitay · 2022-06-17T15:37:50Z

The only problem with that is that the size of the struct is no longer obvious. In this example it’s easy enough to see, but in a more complex structure this would involve needing to add up the size of all of the individual fields.

The size of an extern struct, an extern / packed union, isn't obvious either, along with plenty other types. Should we change them as well? I would argue against it, as it simply isn't necessary to syntactically indicate the size of a packed struct in plenty of cases. Where it is required, it would still be possible with my proposal.

So? Just have the final _ be automatically inferred. Padding bits in the middle of the struct are significant as this corresponds to the underlying structure of the bit pattern or whatever, so it’s fine that their size has to be explicitly stated. The trailing padding bits are just there to ensure the size of the struct matches what the user specified.

I think this defeats one of the main benefits of the explicit backing integer, which is that a change to the backing integer or fields must be made while consciously considering the size and layout. Having a field of the packed struct be automatically resized without obvious indication can hide plenty of mistakes/bugs, and obfuscates the layout of the packed struct, where you have to add up the size of all of the individual fields to determine the size of the padding, which is exactly what you argued against earlier.

Personally, I don't really care about whether we allow packed struct(_) or not, but I'm absolutely against having hidden resizing of struct fields.

Edit: packed struct would probably make more sense than packed struct(_), matching precedent with enum and enum(u8).

gpanders · 2022-06-17T16:03:28Z

The size of an extern struct, an extern / packed union, isn't obvious either, along with plenty other types.

Right, but the difference is in those cases you are not explicitly specifying an integer width like you are in packed struct(u8). If a packed struct (or union) is backed by an integer then knowing the size of that integer is valuable information. If it's not backed by an integer then the non-obviousness is less of an issue (imo).

I'm absolutely against having hidden resizing of struct fields.

Sure and I agree with this, but I don't think _ is a struct field: it's not accessible and it has no meaning other than enforcing the size of the struct itself.

The only other point I want to make (after which I'll leave it be) is that this is not just a matter of convenience (though it is convenient), it also resolves a source of potential ambiguity:

const Foo = packed struct(u8) {
  foo: u3,
  bar: u2,
  _: u4,
};

This fails to compile because the field widths add up to u9 but the integer specification states u8. Which is correct? You would probably say u8 is correct and the size of the padding field is wrong (and I would agree), but you can't say that with certainty precisely because there are two independent ways to specify the size of the struct and neither one is "obviously" the right one. Making the size of the final padding field inferred resolves this by making it clear that the integer specification in struct(u8) is the source of truth.

Your suggestion to use packed struct(_) similarly resolves this ambiguity except that it makes the "source of truth" the size of the padding field itself.

ominitay · 2022-06-17T16:47:23Z

This fails to compile because the field widths add up to u9 but the integer specification states u8.

This is a good thing, and helps prevent typos or other layout mistakes from compiling, which is of significant importance to osdev.

Your suggestion to use packed struct(_) similarly resolves this ambiguity except that it makes the "source of truth" the size of the padding field itself.

As it always has been.

Our conversation really has settled me on the side that neither of these proposals are of value, and that packed struct(u32) is enough, no implicit field sizing, no implicit backing integer. We should prevent ambiguity, and protect programmers from simple mistakes like typos. Both of these proposals harm this.

ominitay · 2022-08-25T18:20:11Z

Nitpick: In discussions outside of the issue tracker, the consensus seems to be that packed(type) struct is more sensible syntax than packed struct(type). This was already mentioned here, but it doesn't seem to have been picked up on.

andrewrk · 2022-09-14T01:13:39Z

This is implemented in the self-hosted compiler, which is now the default compiler.

tuket · 2024-10-01T20:37:00Z

I still miss being able to perform logic operations:

const flags0 = Flags{ .b = true };
const flags1 = Flags{ .a = true, .c = true };
const flags01 = flags0 | flags1;

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Apr 15, 2020

andrewrk added this to the 0.7.0 milestone Apr 15, 2020

daurnimator mentioned this issue Apr 16, 2020

Explicit struct layout #5073

Closed

ghost mentioned this issue Apr 22, 2020

Feature proposal: Enumflagset (zig bit flags type) #4185

Closed

SpexGuy mentioned this issue May 1, 2020

Problem with enums and translate-c #5242

Closed

cartr mentioned this issue Jun 12, 2020

Proposal: allow load/store widths for a pointer to be specified, like alignment #5591

Open

andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 27, 2020

frmdstryr mentioned this issue Nov 13, 2020

remove volatile pointer attribute and keyword. replace with @volatileLoad and @volatileStore #4284

Closed

SpexGuy mentioned this issue Nov 25, 2020

wrong order of bits in bit fields #307

Closed

andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021

SpexGuy mentioned this issue May 20, 2021

translate-c: Add bitfield support #8840

Closed

marler8997 mentioned this issue Jun 1, 2021

Should flags be represented as enums? marlersoft/zigwin32gen#4

Open

SpexGuy mentioned this issue Oct 7, 2021

Inconsistent @bitCast between packed struct and integer on big endian #9914

Closed

ikskuh mentioned this issue Nov 7, 2021

make packed struct always use a single backing integer, inferring it if not explicitly provided #10113

Closed

andrewrk modified the milestones: 0.9.0, 0.10.0 Nov 23, 2021

andrewrk added the accepted This proposal is planned. label Dec 23, 2021

This was referenced Jan 14, 2022

make packed structs versatile #3133

Closed

allow align(0) on struct fields #3802

Closed

BratishkaErik mentioned this issue Jan 19, 2022

3133 DorianXGH/Lukarnel#2

Closed

igor84 mentioned this issue Mar 24, 2022

stage1: Fix packed structs (#2627, #10104) #11279

Merged

andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022

ifreund mentioned this issue May 25, 2022

Stage2 misc fixes #11716

Merged

topolarity mentioned this issue Jul 14, 2022

SIMD vector type syntax: [|N|]T #6771

Open

ifreund mentioned this issue Aug 8, 2022

stage2: Implement explicit backing integers for packed structs #12379

Merged

Vexu mentioned this issue Aug 21, 2022

stage2: @sizeOf packed struct is not working as in stage1 #12565

Closed

andrewrk modified the milestones: 0.11.0, 0.10.0 Sep 14, 2022

andrewrk closed this as completed Sep 14, 2022

andrewrk mentioned this issue Sep 14, 2022

volatile not respected when loading a packed struct #1761

Closed

This was referenced Sep 14, 2022

packed struct produces byte stores instead of word stores #4056

Closed

Memory Mapped IO for 32-bit ARM #1834

Closed

Vexu mentioned this issue Mar 12, 2023

packed struct hidden padding issue #14887

Closed

kj4tmp mentioned this issue Oct 19, 2024

langref: improve packed struct memory layout description #21741

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Integer-backed packed struct #5049

Proposal: Integer-backed packed struct #5049

SpexGuy commented Apr 15, 2020

Tetralux commented Apr 17, 2020

SpexGuy commented Apr 17, 2020

ghost commented Apr 17, 2020

SpexGuy commented Apr 17, 2020

markfirmware commented May 7, 2020

ghost commented Nov 18, 2020

mikdusan commented Jan 5, 2021

N00byEdge commented Nov 7, 2021 •

edited

Loading

ifreund commented Jan 14, 2022

SpexGuy commented Jan 14, 2022

gpanders commented Jan 18, 2022 •

edited

Loading

SpexGuy commented Jan 18, 2022

gpanders commented Jan 18, 2022 •

edited

Loading

praschke commented Jun 17, 2022

ominitay commented Jun 17, 2022

gpanders commented Jun 17, 2022

ominitay commented Jun 17, 2022 •

edited

Loading

gpanders commented Jun 17, 2022

ominitay commented Jun 17, 2022 •

edited

Loading

ominitay commented Aug 25, 2022

andrewrk commented Sep 14, 2022

tuket commented Oct 1, 2024

Proposal: Integer-backed packed struct #5049

Proposal: Integer-backed packed struct #5049

Comments

SpexGuy commented Apr 15, 2020

Tetralux commented Apr 17, 2020

SpexGuy commented Apr 17, 2020

ghost commented Apr 17, 2020

SpexGuy commented Apr 17, 2020

markfirmware commented May 7, 2020

ghost commented Nov 18, 2020

mikdusan commented Jan 5, 2021

N00byEdge commented Nov 7, 2021 • edited Loading

ifreund commented Jan 14, 2022

SpexGuy commented Jan 14, 2022

gpanders commented Jan 18, 2022 • edited Loading

SpexGuy commented Jan 18, 2022

gpanders commented Jan 18, 2022 • edited Loading

praschke commented Jun 17, 2022

ominitay commented Jun 17, 2022

Status Quo:

Alternative:

gpanders commented Jun 17, 2022

ominitay commented Jun 17, 2022 • edited Loading

gpanders commented Jun 17, 2022

ominitay commented Jun 17, 2022 • edited Loading

ominitay commented Aug 25, 2022

andrewrk commented Sep 14, 2022

tuket commented Oct 1, 2024

N00byEdge commented Nov 7, 2021 •

edited

Loading

gpanders commented Jan 18, 2022 •

edited

Loading

gpanders commented Jan 18, 2022 •

edited

Loading

ominitay commented Jun 17, 2022 •

edited

Loading

ominitay commented Jun 17, 2022 •

edited

Loading