Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the safety invariant, if any, for unions? #352

Open
alercah opened this issue Jul 17, 2022 · 11 comments
Open

What is the safety invariant, if any, for unions? #352

alercah opened this issue Jul 17, 2022 · 11 comments

Comments

@alercah
Copy link

alercah commented Jul 17, 2022

Per #73, discussion is converging on unions having no validity invariant, i.e., any bit-pattern is valid for a union type, including completely uninitialized memory. In safe Rust, however, not all valid bit-patterns can necessarily be created, because unions are checked for initialization:

// crate definer:
#[repr(C)]
pub union U { pub i: i32 };
let u: U;
unsafe { &u.i } // Error: u is not initialized

Because union values cannot be consumed in safe Rust, however, we find ourselves needing to decide which, if either, of the following two functions are unsound:

// crate producer:
pub new_u() -> definer::U { unsafe { std::mem::uninitialized() } }
// crate consumer:
pub get_i(u: definer::U) -> i32 { unsafe { u.i } }

At least one must be unsound, because another crate outside the trust boundary of either of them can call producer::get_i(consumer::get_i()) which is clearly UB.

To summarize the Zulip background, @RalfJung expressed the opinion that, by default, unions have an unspecified safety invariant and therefore, in the absence of clear documentation from the definer crate on a safety invariant, both the producer and consumer crate are unsound. The producer should not create a value which cannot be created with safe Rust, and the `consumer crate cannot assume that the value has any properties.

@Lokathor brought up that the safe transmute project is also interested in safe union field access in situations where all fields can be safely transmuted to one another. That is to say, unsafe would not be required to use a type like union { i: i32, f: f32 }. Since this is trivially true of one-union fields, this would mean that get_i would be not only sound but actually safe.

This requires a safety invariant, as clearly the uninitialized union would cause safe field access to be unsound. In reply, @RalfJung suggested that for unions he would simply have suggested the trivial invariant, i.e., all values are safe, including uninitialized ones, but that this safe transmutation would suggest an additional invariant. Note that if all unions have a trivial safety invariant, then the producer crate above would be sound.

So what actually is the safety invariant? Some options:

  1. Unspecified safety invariant—both producer and definer are unsound.
  2. An "initializedness" safety invariant—producer is unsound, but definer is sound (and possibly later safe with compiler support), because safe code cannot produce the uninitialized value.
  3. A more complex safety invariant based on safety of mutual field transmutation, to allow general type punning using unions.
    1. This could be automatic for all types with mutual transmutation; it is viable from a logical and technical point of view but may be hard to apply in practice in unsafe code, as it would require the coder to carefully think about whether the union has safe transmutation. It would also create a risk that changes to the union definition would silently create new UB (e.g. by adding a new field to a struct member of the union, which is normally not a breaking change).
    2. There was at one point a suggestion that safety could be defined as "this value can be arrived at through an arbitrary sequence of safe operations." This is also logically possible, but probably even worse. And probably also useless in practice.
    3. @CAD97 suggested a new attribute, e.g. #[safe_transmute_union], which could allow an opt-in mutual transmutation safety invariant. Thus either 1 or 2 would apply by default, but the attribute would change the invariant.

I am extremely partial to this last option because it makes it clear when a union does or does not support this type of invariant. And I do not like the idea of unspecified safety invariants, so I would go with option 2 for the default. (edit: see below)

I'm opening a separate issue about the offset of the field (which is why #[repr(C)] is required in the example).

@CAD97
Copy link

CAD97 commented Jul 17, 2022

Note that if all unions have a trivial safety invariant, then the producer crate above would be sound.

This isn't quite correct. producer is never sound without definer specifying that what it is doing is allowed, because all types have a baseline safety requirement of only doing safe operations, definer is allowed to define arbitrary safety invariants so long as unsafe code is required to break them. It would be perfectly valid for definer to strengthen the guarantee of U and provide the API that consumer illustrates here, making producer clearly unsound. This does not restrict the behavior of external safe code, and so the soundness of this new unsafe within the privacy barrier should only rely on other code within the privacy barrier.

Unless definer provides any guarantees, it is not sound for any downstream crate to expose a value which could not be created safely. The default safety invariant for any type without documentation stating otherwise is that only safe code can be used (this is case (1)).

This isn't an unspecified safety invariant. It is the baseline safety invariant, of "only use safe code." This is a vaguely operational rule rather than an axiomatic representational one, but it is the baseline for every Rust type.

[Safety invariant (3.i)] would also create a risk that changes to the union definition would silently create new UB (e.g. by adding a new field to a struct member of the union, which is normally not a breaking change).

It wouldn't create new UB; any existing codepath would remain UB-free. What it does do is make code which was potentially sound previously now unsound (able to produce UB from safe inputs).


I'm obviously in favor of (1) + #[safe_transmute_union]. Specifically, that means

  • Validity of union types is trivial (any bytes).
  • Safety requirement of union before documentation is baseline (only use safe code).
  • A #[safe_transmute_union] attribute (or some other opt-in) which just makes it safe to read from union fields.
    • This adds a safety invariant of always being safe at all field types.

Making it safe to construct an uninitialized means that #[safe_transmute_union] cannot be sound (because an uninitialized union can be constructed safely) unless it also as a side effect of removing the default construction option. Assuming we get field-level defaults for structs, I would expect to be able to do U {..} and get whatever the default value of U is; this could be initialized if U were defined as union U { i: i32 = 0 } or uninitialized if it were defined as union U { i: i32, _: () = () }.


But more than anything else, I think a pub union with pub fields without any documentation is a disastrous API faux pax one way or another. (Implied documentation from being a translation of a C type counts as documentation for the purpose of this lint.)

@alercah
Copy link
Author

alercah commented Jul 18, 2022

This isn't quite correct. producer is never sound without definer specifying that what it is doing is allowed. Even if the default baseline safety invariant implied by the language is trivial, definer is allowed to define arbitrary safety invariants so long as unsafe code is required to break them. It would be perfectly valid for definer to strengthen the guarantee of U such that only

Your sentence is cut off here.

@CAD97
Copy link

CAD97 commented Jul 18, 2022

Whoops, that's what I get for writing these comments out-of-order. Finished my thought there.

@alercah
Copy link
Author

alercah commented Jul 18, 2022

To clarify, by "trivial safety invariant", I mean the invariant of True, i.e., all values are safe.

@CAD97
Copy link

CAD97 commented Jul 18, 2022

This seems reasonable; I changed my comment to use "baseline safety requirement" to refer to the nontrivial requirement put onto all Rust code to not do unsound unsafe things.

@alercah
Copy link
Author

alercah commented Jul 18, 2022

Thanks. With this clarification I definitely wrote my comment up wrong. I will edit it to capture the distinction I was trying to make.

@alercah
Copy link
Author

alercah commented Jul 18, 2022

Now to actually respond...

It would be perfectly valid for definer to strengthen the guarantee of U and provide the API that consumer illustrates here, making producer clearly unsound.

This isn't an unspecified safety invariant. It is the baseline safety invariant, of "only use safe code." This is a vaguely operational rule rather than an axiomatic representational one, but it is the baseline for every Rust type.

I had to think about why this concept didn't quite sit right to me, and I've figured it out. This leads to me agreeing with CAD97 (and by extension RalfJung), but I'll take the time to explain why mostly for the benefit of others who thought similarly, and for the benefit of documentation writers trying to prevent it. I think it's largely because of the following intuitive heuristic:

  1. For primitives, the safe values are exactly those obtainable in safe Rust.
  2. For ADTs, the safe values are as specified by the documentation, but if not all pub fields are recursively safe, then a value is definitely unsafe, and the documentation cannot relax this requirement.

Trying to fit unions into this framework goes horribly. It's easy for people, starting from this intuition, end up on a logical journey. (For illustrative purposes only):

  1. Unions are ADTs.
  2. Wait, we can't require all fields to be safe. So at least one field has to be safe.
  3. `(3, 3). Oops. So that doesn't work either. But surely if there is only one field, it must be safe.
  4. Okay, that logic's bad too? So I guess unions are more like primitives then. So it must have a safely obtainable value.
    So I think

[Safety invariant (3.i)] would also create a risk that changes to the union definition would silently create new UB (e.g. by adding a new field to a struct member of the union, which is normally not a breaking change).

It wouldn't create new UB; any existing codepath would remain UB-free. What it does do is make code which was potentially sound previously now unsound (able to produce UB from safe inputs).

That's almost worse 😂. More chance of the problem going undetected for longer.

I'm obviously in favor of (1) + #[safe_transmute_union]. Specifically, that means

  • Validity of union types is trivial (any bytes).

  • Safety requirement of union before documentation is baseline (only use safe code).

  • A #[safe_transmute_union] attribute (or some other opt-in) which just makes it safe to read from union fields.

    • This adds a safety invariant of always being safe at all field types.

Making it safe to construct an uninitialized means that #[safe_transmute_union] cannot be sound (because an uninitialized union can be constructed safely) unless it also as a side effect of removing the default construction option.

There is no default construction option for a union at present. That was a spitball idea that we discussed briefly, but Ralf is opposed and I am not proposing it. Someone else can spend their time on that if they really want.

But more than anything else, I think a pub union with pub fields without any documentation is a disastrous API faux pax one way or another. (Implied documentation from being a translation of a C type counts as documentation for the purpose of this lint.)

A false peace? Who is the war between? 😛

But yes, it really is. Such a type is almost completely useless. Which is probably another thing that affects heuristic reasoning: if I have access to the field, surely I'm meant to do something with it?

@Lokathor
Copy link
Contributor

Lokathor commented Jul 18, 2022

[digs through the old code bin]

I dunno, I've got this thing here that's been sitting in a lib for ages and the API seems pretty clear to me:

#[allow(non_camel_case_types)]
#[repr(C, align(16))]
#[rustfmt::skip]
union ConstUnionHack128bit {
  f32a4: [f32; 4],
  f64a2: [f64; 2],
  i8a16: [i8; 16],
  i16a8: [i16; 8],
  i32a4: [i32; 4],
  i64a2: [i64; 2],
  u8a16: [u8; 16],
  u16a8: [u16; 8],
  u32a4: [u32; 4],
  u64a2: [u64; 2],
  f32x4: f32x4,
  f64x2: f64x2,
  i8x16: i8x16,
  i16x8: i16x8,
  i32x4: i32x4,
  i64x2: i64x2,
  u8x16: u8x16,
  u16x8: u16x8,
  u32x4: u32x4,
  u64x2: u64x2,
  u128:  u128,
}

The union and fields happen to not be pub, but they could be and it'd be just as useful.

@Ixrec
Copy link
Contributor

Ixrec commented Jul 18, 2022

True, but this also seems like a perfect example of a union that I wouldn't expect to see in an API boundary, and probably "shouldn't exist" anyway in a hypothetical future with safe transmutation.

(assuming it is as self-evident as we think it is, and we're both imagining similar usage for it)

@RalfJung
Copy link
Member

RalfJung commented Jul 18, 2022

This is a somewhat odd discussion because generally speaking, safety invariants are defined by the author of the respective type. That's, like, their entire purpose. On the UCG / lang-team side, we should not be in the business of defining other people's safety invariants.

That said, there is somewhat of a "default safety invariant" that justifies all the operations which are provided "by default" for such a type. For a struct, that is simply the safety invariant of all its fields. Crate authors can strengthen and/or weaken this as they see fit. However, as you noted, you have to be careful that the public API surface is still justified by the safety invariant -- so if you have a pub field, usually your safety invariant must (a) ensure that that field always satisfies its safety invariant, and (b) ensure that your safety invariant is maintained when the value in that field is replaced by an arbitrary other safe value for the field type. (Condition (b) can sometimes be skipped if you only ever hand out &T over the public API surface and there is no interior mutability.)

We never had to spell any of this out because it is somewhat obvious -- if you don't use unsafe, then the default safety invariant is fine, and when you do use unsafe it is fairly clear that pub fields easily lead to unsoundness and intuition will tell you what you are allowed to do. Nobody expects to be allowed to just transmute a value of a struct into existence, even if you exactly know the layout and type of all fields, unless there already is a safe way of doing exactly the same thing. (So the "producer" crate would be clearly ridiculous if this was a struct type. It is no better for a union.)

For unions, the only default safe operations we provide are

  • Constructing a union by giving a value for one of its fields.
  • Writing to any union field (that you have access to, in terms of its visibility).

That's not a lot, and hence the "default safety invariant" can indeed be trivial, i.e., True. And if you don't use any unsafe then this safety invariant will work just fine for you. But using unions always(*) implies using some unsafe, and then what? With struct the situation is that the user needs to document their invariant, period -- we cannot pick it for them. I was expecting we would do the same for unions.

However the general sentiment seems to be that we cannot do that? I think at this point we are entering the area of libs API guidance -- we are basically defining, if the crate author fails to document the safety invariant of their pub union, what may a crate consumer assume is the "default" invariant? I don't think we should allow them to assume anything. It's way too easy to get incoherent results that way. That's why I think in @alercah's example, both crate 2 and 3 (producer and consumer) are wrong, since they both made assumptions about the crate 1 (definer) pub union type that are not justified by that crate's documentation.

I could get behind a way to add an attribute that explicitly declares a safety invariant and also interacts with safe transmute things. I don't think anything like that should happen by default though. Basically, adding #[safe_transmute_between_fields] or whatever is a machine-readable equivalent to adding a doc comment that says "safety invariant of this union: all its fields satisfy the safety invariant", and since it is machine-readable we can exploit it to make some operations safe (assuming we can actually check that the fields all have equivalent safety invariants).

So, I think I am in full agreement with @CAD97. :) Option 1 in the original post, plus maybe an attribute for option 3iii, if it indeed has sufficient motivation.

@CAD97
Copy link

CAD97 commented Jul 18, 2022

@alercah

So I think [cuts off]

Now it's my turn to point out that your thought got cut off :) as well as it looks like markdown may have eaten some context in that section as well

alercah added a commit to alercah/unsafe-code-guidelines that referenced this issue Oct 8, 2022
I have tried to document everything I believe we have consensus on. I've
left some things open that I possibly could have closed, but because
this PR is very big, I would like to focus on getting it in as quickly
as possible and worrying about whatever's left aftwards.

I strongly encourage others to submit follow up PRs to close out the
other open issues.

Closes rust-lang#156.
Closes rust-lang#298.
Closes rust-lang#352.
alercah added a commit to alercah/unsafe-code-guidelines that referenced this issue Oct 8, 2022
I have tried to document everything I believe we have consensus on. I've
left some things open that I possibly could have closed, but because
this PR is very big, I would like to focus on getting it in as quickly
as possible and worrying about whatever's left aftwards.

I strongly encourage others to submit follow up PRs to close out the
other open issues.

Closes rust-lang#156.
Closes rust-lang#298.
Closes rust-lang#352.
alercah added a commit to alercah/unsafe-code-guidelines that referenced this issue Oct 15, 2022
@alercah alercah mentioned this issue Oct 15, 2022
alercah added a commit to alercah/unsafe-code-guidelines that referenced this issue Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants