Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Change extern modifier for structs to mean "guaranteed layout", not "for use with C". #6700

Open
SpexGuy opened this issue Oct 16, 2020 · 16 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@SpexGuy
Copy link
Contributor

SpexGuy commented Oct 16, 2020

This is a small distinction but it changes the result of questions about how zero-sized extern structs should behave, as well as questions about whether they can reference non-extern structs.

The primary use cases for extern in the absence of the C ABI are type punning and MMIO mappings. Packed may not be a great solution for these cases because alignment and read/write speed are both important.

Better discussion of this question


Original Issue:
Potential design flaw: Pointer to empty extern struct has no bits

My initial line of reasoning was this: Extern structs are ABI types and must match the layout of C. Pointers to extern structs are also ABI types. Therefore pointers to extern structs should always be real pointers, even if the struct is empty.

const Empty = extern struct {};
comptime { if (@sizeOf(*Empty)) == 0) @compileError("not ABI compatible"); }

This might also extend to packed structs.

However, it appears that C doesn't actually allow empty structures.
The C99 specification says:

If the struct-declaration-list contains no named members, the behavior is undefined.

But in practice, most compilers allow empty structs. gcc, clang, and icc all give the struct size 0 when compiling as C and size 1 when compiling as C++. msvc has a compile error for C, and uses size 1 for C++.
So I guess this issue isn't as clear cut as I first thought. Still, I think it merits discussion. Maybe it should be a compile error?

@Snektron
Copy link
Collaborator

As the behaviour is undefined, i vote to disallow all empty extern structs. As you noted, many compilers still allow it, but since there is no concrete behaviour i think it should be up to the user to work around those issues. Perhaps an exception should be made for translate-c though? Im not sure how much those empty structs are used in practise, but if its often that might be required.

@andrewrk
Copy link
Member

Brainstorming here: the "ABI" part of the target triple implies a reference C compiler that is the definition of the ABI. So for example, x86_64-windows-msvc means we use MSVC as the reference C compiler that we are ABI matching. x86_64-linux-gnu means we are using GCC as the reference C compiler that we are ABI matching. So that could potentially answer the question: does an extern struct allow having no fields => what does the "main" C compiler of that ABI do?

However that being said, I think if the C99 specification is clear, which it looks pretty clear to me, that 0 fields in a struct is UB then we should make an empty extern struct a compile error.

@andrewrk
Copy link
Member

For packed structs we should allow 0 size, however, it would be a compile error to use a size-0 packed struct in a C calling convention function, just like it is a compile error to use u0 or void in a ccc function.

@SpexGuy
Copy link
Contributor Author

SpexGuy commented Oct 16, 2020

In that case, should we impose that same lessened restriction on extern structs? Allow zero-size extern structs with the current semantics but don't allow them (or types that reference them) for C calling convention functions?

@Snektron
Copy link
Collaborator

In what kind of situation could those be preferred over a packed or regular struct?

@mrakh
Copy link
Contributor

mrakh commented Oct 16, 2020

I think that, since this is mainly a GNU C ABI issue, this is something better handled by translate-c, rather than by the language itself. The only sensible use for empty structs in GNU C is as a sort of 'typed' opaque pointer (which is strange, because a simple declaration of the struct, without braces, can do this as well, and is defined in the C standard). So all that's needed is for translate-c to identify pointers to empty structs, and represent them with an opaque Zig struct.

@andrewrk
Copy link
Member

In that case, should we impose that same lessened restriction on extern structs? Allow zero-size extern structs with the current semantics but don't allow them (or types that reference them) for C calling convention functions?

This prompts what I see to be the main backing question here: is extern a declaration that a type is intended to be used in the C ABI? Or does it only declare a memory layout? If the former, then extern has utility in that it provides a helpful compile error if you try to add a field (or lack thereof) that can't be represented in the C ABI. If the latter, then I think your suggestion above makes sense. This is related to #3133.

The status quo answer to this question is that extern is a declaration that a type is intended to be used in the C ABI. I do think a proposal to change is worth considering, especially in light of #3133 and #3802. Shall we transform this issue into that proposal?

@SpexGuy
Copy link
Contributor Author

SpexGuy commented Oct 17, 2020

I agree that that's worth considering, and that it backs this question. I'll update. Feel free to modify.

I apparently think that this should be changed, because I was in the middle of typing up this response:


In what kind of situation could those be preferred over a packed or regular struct?

extern struct isn't just for ABI use, it's also the only way to lay out memory in a way that respects alignment. You may want to lay out some data and then read it back later interpreted a different way. For example, I've used this struct (based on this GDC talk) in the past for a half-edge structure with good cache behavior:

const Edge = extern struct {
    vertex_index: u32,
    opposite_edge_index: u32,
};
const Triangle = extern struct {
    edges: [3]Edge,
    flags: u64, // same size as an Edge
};
comptime { assert(@sizeOf(Triangle) == @sizeOf(Edge) * 4); }

const AdjacencyMesh = struct {
    vertices: []Vertex,
    triangles: []Triangle,
    fn edges(self: AdjacencyMesh) []Edge {
        return @ptrCast([*]Edge, self.triangles.ptr)[0..self.triangles.len * 4];
    }
    fn edgeIndexInTriangle(edgeIndex: u32) u32 {
        return edgeIndex % 4;
    }
    fn triangleIndexFromEdgeIndex(edgeIndex: u32) u32 {
        return edgeIndex / 4;
    }
};

There may be a reason to include an empty struct or a pointer to an empty struct in a data structure like this, potentially as part of a generic type. But using type punning with types that are generic in that way is not exactly good practice, so maybe it's fine as is?

@SpexGuy SpexGuy changed the title Potential design flaw: Pointer to empty extern struct has no bits Proposal: Change extern modifier for structs to mean "guaranteed layout", not "for use with C". Oct 17, 2020
@andrewrk
Copy link
Member

andrewrk commented Oct 17, 2020

Here are the struct concepts that zig recognizes:

  • ordering
    • ordered: the fields are in a well defined memory layout, corresponding to declaration order and padded according to alignment rules. bitcasting is available.
    • unordered: the memory layout is undefined. bitcasting is Illegal Behavior. The undefined memory layout gives zig some flexibility in terms of safety checks as well as memory layout optimizations.
  • default alignment the fields have a default alignment unless overridden. Options are:
    • @alignOf(FieldType) also known as "ABI alignment"
    • align(1) (byte alignment)
    • align(0) (can be bit-packed)
  • allow non C ABI compatible types - whether to emit a compile error if the type could not be used in the C calling convention.

In status quo zig + accepted proposals mentioned above, we have 3 kinds of structs:

  • struct
    • unordered
    • default alignment: ABI aligned
    • allow non C ABI compatible types
  • extern struct
    • ordered
    • default alignment: ABI aligned
    • disallow non C ABI compatible types
  • packed struct
    • ordered
    • default alignment: align(0)
    • allow non C ABI compatible types

As you can see, the 3 options do not do an adequate job of surfacing the struct layout options that are available. I do think struct is satisfactory, but I could see the value in replacing extern struct and packed struct with different syntax that better surfaces the options here. I'm also open to the possibility of removing the feature of "allow non C ABI compatible types" being a property of a struct, and making it a part of validation of C calling convention functions, as noted above.

One important question to answer is: will ordered, ABI aligned structs always match the C ABI? I think the answer is "yes". But if there were any counter-examples that would influence the design process here. I'm not aware of any counter examples.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 17, 2020
@andrewrk andrewrk added this to the 0.8.0 milestone Oct 17, 2020
@tadeokondrak
Copy link
Contributor

tadeokondrak commented Oct 17, 2020

One usecase to consider is exporting a C library that has functions that take a pointer to a Zig struct as a parameter, but only exposes the struct as opaque in the header. The compiler should probably allow that somehow.

@andrewrk
Copy link
Member

Zig already allows pointers to anything in CCC functions

@mrakh
Copy link
Contributor

mrakh commented Oct 17, 2020

As you can see, the 3 options do not do an adequate job of surfacing the struct layout options that are available. I do think struct is satisfactory, but I could see the value in replacing extern struct and packed struct with different syntax that better surfaces the options here. I'm also open to the possibility of removing the feature of "allow non C ABI compatible types" being a property of a struct, and making it a part of validation of C calling convention functions, as noted above.

I touched on this in #6478, but I think that the best way to give programmers fine-grained control over the memory layout of a struct, is to provide a mechanism to explicitly set the field offsets. Any combination of alignment/ordering/overlapping can then be represented, so it generalizes to working with C union types as well. As a bonus, it allows the programmer to not just conform to the C ABI, but to conform to and define any ABI they can think of.

@kenaryn
Copy link

kenaryn commented Oct 24, 2020

For academic purpose, I would like to add that the C2x working draft spec (published in february 2020 and available here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2478.pdf) reiterate the same undefined behaviour, although with a slightly more specific constraint, id est:

If the member declaration list does not contain any named members, either directly or via an anonymous structure or anonymous union, the behavior is undefined.
(See page 87 paragraph 10).

Nota: the word "member" replace the former syntax "struct" (see before-last line page 2).

@topolarity
Copy link
Contributor

Would this change apply to union as well?

The C specification says:

The size of a union is sufficient to contain the largest of its data members. Each data member is allocated as if it were the sole member of a struct.

so it seems like agreement in the C ABI for structs would also extend to unions.

If so, extern union would nearly be able to replace packed union, depending on the alignment details.

@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@andrewrk andrewrk removed this from the 0.11.0 milestone Apr 9, 2023
@andrewrk andrewrk added this to the 0.12.0 milestone Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
@ni-vzavalishin
Copy link

I think it's rather useful to be able to define guaranteed layout structs with automatically aligned fields, which packed structs don't do: one has to do manual alignment in case of the latter. As mentioned in an earlier comment, one might want to read the struct data in a different way. To contribute one more example to the one proposed in the comment, consider the following case which I had for real.

So there is a struct of simd vectors (some vectors of i32, some of f32) of the same size, which one should be able to alternatively read/write as a flat array of 32 bit integers or floats. Besides the vector fields, other fields of the struct may be nested structs of the same kind, some of which may be empty (apparently the entire top-level struct still can be seen as an array of 32-bit values in such cases). In this particular case probably packed structs would have worked too, but conceptually extern structs seem to better express the intention, as the fields are expected to have their natural alignments by design, not by chance of the other fields having aligned sizes. If extern structs disallowed zero sizes or non-C-ABI-compatible types, one would have no choice but use packed structs. The latter would have caused problems in case of alignment padding between the struct fields, which one would need to correctly maintain "by hand". The same concern of maintaining alignment by hand would apply to this proposal.

In this regard allowing any types and sizes within extern structs sounds not just as a good idea but as an important and indispensable part of the language's functionality. The related idea of preventing non-C-ABI-compatible extern structs in C-calling-convention functions may be raising some questions, as maybe such functions are not going to be used exclusively for interfacing with C, but some other languages (supporting similar ABI) as well. Or maybe one wants to interface just with a particular C compiler, which supports zero size structs and maybe some non-standard types, idk. Not sure if additional safety attained by generating errors for such structs is worth the lost functionality.

@ni-vzavalishin
Copy link

How about simply splitting extern struct into two distinct versions:

  • extern struct has guaranteed ordering, respects default alignment and allows non C-compatible types
  • extern "c" struct is the same but only allows C-compatible types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

8 participants