Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Never patterns #3719

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

Nadrieril
Copy link
Member

@Nadrieril Nadrieril commented Oct 27, 2024

Summary

A ! pattern indicates a type with no valid values. It is used to indicate an impossible case when matching an empty type in unsafe code.

enum Void {}
unsafe fn empty_tup<T>(tup: *const (T, Void)) -> ! {
    unsafe {
        match *tup { ! }
    }
}

Note: this RFC is purely about a new pattern syntax. It does not propose changes to exhaustiveness checking or to the operational semantics of uninhabited types.

Rendered

@programmerjake
Copy link
Member

it might be a good idea to instead require an empty block (or maybe ! => !, where ! is an expression -- a constructor for the ! type that the compiler ensures is never reachable) as the match arm, since that makes it work with macros a whole lot better since all macros currently expect that all match arms have bodies.

@traviscross traviscross added T-lang Relevant to the language team, which will review and decide on the RFC. I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. labels Oct 28, 2024
# Motivation
[motivation]: #motivation

Rust's unsafe semantics are access-based: the validity of data only matters when reading from/writing to a place. When pattern-matching, patterns only access the data (discriminants/values) they need to choose the right arm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's unsafe semantics are actually mostly not access-based. Constructing an invalid value is UB.

Only behind raw pointers and inside union fields do we use access-based semantics. So the text here is quite misleading as-is.

Copy link

@golddranks golddranks Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niko's blog post from year 2018, linked in the RFC text, says that

While the details of our model around unsafe code are still being worked out (in part by this post!), there is a general consensus that we want an “access-based” model

...mentioning about &! being possibly a valid value. Just to clarify, I presume this is now stale information and having a type &! is considered UB? Also, the same article presents following code:

unsafe {
  let x: Uninit<(u32, T)> = Uninit { uninit: () };
  x.value.0 = 22; // initialize first part of the tuple
  ...
  match x.value {
    (v, _) => {
      // access only first part of the tuple
    }
  }
  ...
}

Is that accepted by modern standards? Seems a bit sketchy that the union field is already accessed in the match expression, so the fields are not matched against inside the match. By extension, one should be able to do the following, which, by modern standards, is surely UB, right? (To be sure, the RFC presents code that matches against the union fields inside the match.)

unsafe {
  let x: Uninit<(u32, T)> = Uninit { uninit: () };
  x.value.0 = 22; // initialize first part of the tuple
  ...
  let tmp = x.value; // move from x.value!
  match tmp {
    (v, _) => {
      // access only first part of the tuple
    }
  }
  ...
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A match expression works on places, not values. So the two examples you show are not equivalent! The first never turns x.value into a value, it only ever accesses its first field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But values don't exist in memory, they only exist while reading or writing, right? And matches don't access compound values wholesale, they only inspect memory piecewise. In particular I can mess up some bytes of a place and still match on it if the match only inspects the non-messed-up bytes.

Is that not "access-based"? Am I incorrect in the above understanding?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an access-based semantics, it is generally expected that e.g.

let b: bool = transmute(13u8);

is fine if we never "access" b. So I think it is misleading to call Rust's semantics access-based.

But what you say about places is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Is there a term of art for the particular flavor of "validity only matters when reading/writing" that rust has?

Otherwise I don't can do without this general statement. Would this be ok: "The semantics of pattern-matching around unsafe values are access-based: patterns only access the data (discriminants/values) they need to choose the right arm. The accessed data must be valid, but the rest may be invalid or uninitialized"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of a specific term. I would summarize it as "enforcing language invariants on every typed copy".

"The semantics of pattern-matching around unsafe values are access-based: patterns only access the data (discriminants/values) they need to choose the right arm. The accessed data must be valid, but the rest may be invalid or uninitialized"

"unsafe value" isn't a standard term. I've used "unsafe places" in the past but it is not standard either; the RFC should define it if it wants to use it.

I would say something like: Rust's pattern match semantics are access-based. Only the parts of the underlying value representation that actually matter for the pattern have to be valid.

}
```

Here the absence of an arm plays the role of a "read discriminant" kind of operation. This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here the absence of an arm plays the role of a "read discriminant" kind of operation. This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern.
Here the absence of an arm plays the role of a "read discriminant" kind of operation.
(If, instead, there was an arm with pattern `_`, then there would be no "read discriminant" operation.)
This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern.

Comment on lines +44 to +46
_ => unreachable_unchecked(),
// or
(_, x) => match x {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ => unreachable_unchecked(),
// or
(_, x) => match x {}
(_, x) => match x {}

The first example seems odd and orthogonal to the RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe _ => unreachable!() is better? That's just what people commonly write in this case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be what people commonly write, but the intent of this PR is better described by match x {} so IMO we should have only that.


Here the absence of an arm plays the role of a "read discriminant" kind of operation. This is fine in this case, but around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern.

Today, when matching empty types inside places that may contain uninitialized data, rust requires a dummy match arm:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Today, when matching empty types inside places that may contain uninitialized data, rust requires a dummy match arm:
Today, when matching empty types inside places that may contain uninitialized data, rust requires a match arm that explicitly reads the empty type:

}
impl<T: Copy> MyUnion<T> {
unsafe fn assume_init(self) -> ! {
match self { MyUnion { value: (_, !) } }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
match self { MyUnion { value: (_, !) } }
match self { MyUnion { value: ! } }

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Patterns can be used on a partially uninitialized place; the basic rule is that a pattern only accesses data that is directly mentioned in the pattern: an `Ok(..)` pattern requires a discriminant read, a binding requires a full read, a wildcard `_` accesses nothing etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Patterns can be used on a partially uninitialized place; the basic rule is that a pattern only accesses data that is directly mentioned in the pattern: an `Ok(..)` pattern requires a discriminant read, a binding requires a full read, a wildcard `_` accesses nothing etc.
Patterns can be used on a partially uninitialized place; the basic rule is that a pattern only accesses data that is directly mentioned in the pattern: an `Ok(..)` pattern requires a discriminant read, a by-value binding requires a full read, a wildcard `_` accesses nothing etc.

There are also ref bindings, after all...

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

We add `!` to the syntax of patterns. A `!` pattern is accepted for any type that is visibly uninhabited (i.e. uninhabited taking into account private fields and `#[non_exhaustive]` annotations) from the item containing the match expression.
Copy link
Member

@RalfJung RalfJung Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many possible definitions of "uninhabited" so it'd be good to spell this out. E.g., is &! uninhabited?

What I would expect is that ! is allowed on all types whose library invariant ("safety invariant") cannot be satisfied (it is an explicit operation after all), so we can treat &! as uninhabited.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I had not considered the safety version of uninhabitedness. This does feel like the appropriate notion to use, I'll change that

Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, if we use this wider definition of "uninhabited" for when the ! pattern can apply (rather than a definition such as that of "empty" in rust-lang/rust#119612 (comment)), does this mean the following would be acceptable?

Such syntax was used in this comment from Ralf so I assume this might be the intention?

fn abc(value: Option<&!>) {
   match value {
        None => {},
        !
   }
}

I rather like this, but just to flag that it is a change of behaviour from:

  • The current never_patterns feature which would require a more explicit Some(!) pattern (playground link)
  • The syntax in the auto-never blog post which I believe would have required a Some(&!) pattern to draw attention that it accesses the &! (and so that it asserts validity - if we replace references with pointers)

Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, would the ! pattern be able to match multiple uninhabited types in a wildcard manner?

To create an explicit example, would the following compile, on the justification that:

  • MultiNever::A(&MultiNever::B(Empty)), MultiNever::A(&MultiNever::C(Empty)), MultiNever::B(Empty) and MultiNever::C(Empty) are all uninhabited based on the above rules.
  • We may allow ! to wildcard over all uninhabited types.
enum Empty {}
enum MultiNever<T> {
   A(T),
   B(Empty),
   C(Empty),
}
fn abc(value: MultiNever<&MultiNever<u32>>) -> u32 {
    match value {
        MultiNever::A(MultiNever::A(value)) => *value,
        !
    }
}

If this does compile, I'm not sure if this necessarily fits with the motivation that "accesses should be explicitly visible in the pattern." - but possibly I've got the wrong end of the stick?

Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just revisiting my musing in the last comment, I missed that the motivation in the RFC actually says more fully that "around unsafe code that interacts with possibly-uninitialized data, accesses should be explicitly visible in the pattern" - which seems right to me. The motivation only cares about accesses which could cause UB being explicitly visible in the pattern, or slightly more loosely that "accesses which could cause UB without breaking a safety invariant should be explicit".

And so Ralf's justification is that, references have a safety invariant to point at valid data, and as such have a recursively-safe safety invariant. Therefore it should be okay to effectively view a never pattern as:

A never pattern acts as an explicit compile-time assertion that all unmatched types have an unsatisfiable safety invariant; and also as an explicit compile-time assertion that all those types are invalid at this position.

By contrast, e.g. pointers or union fields do not have such a safety invariant, so would not satisfy such a recursive safety-based definition of uninhabitedness, and would need to be handled explicitly.

Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer that the user write Some(!) explicitly because I find it more explicit and easier to specify/implement. This is open for discussion of course, that's the aim of the RFC.

Good point! Well, in the spirit of discussion, let me propose for debate the "wildcard !" or more precisely the "safety-uninhabitable binding" definition of the never pattern. Such a pattern would:

Apply against all unmatched branches (similarly to _ or a variable binding a), and, both:

  • Assert (on pain of a compile error) that the "leaf type" of each of these branches is safety-uninhabited. (Where by leaf type I'm being quite wooly, I mean the type you'd get if you re-interpret the leaf enum variant of the branch as a struct type)
  • Assert (on pain of UB) that all such branches are not reachable (i.e. all such leaf types are validity-uninhabited in this position)

If this is well defined (and, to be fair, that's a big if), it appears to me that it's a strictly more powerful extension to this proposal - in the sense that it is allowed in a superset of places.

I was also wondering it it were too powerful, but I think on balance that it's likely OK. A ! at the root of a nested match is a declaration that "all unspecified branches are not possible (on pain of UB)".

It has benefits that it can mostly solve the ergonomics issues in this issue without auto-never, by only requiring a user to add a ! to their matches. Macros could do this in a blanket fashion, and pair this with something like an #[allow(unneeded_never_pattern)].

Users in unsafe land could be more careful, and use ! more explicitly, in place of individual types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to clarify, I believe that under this RFC, the first 3 compile, but the fourth one does not:

correct!

If this is well defined (and, to be fair, that's a big if), it appears to me that it's a strictly more powerful extension to this proposal - in the sense that it is allowed in a superset of places.

I believe this is well-defined, and agree that this would be allowed in strictly more places than the RFC currently proposes.

It has benefits that it can mostly solve the ergonomics issues in rust-lang/rust#131452 without auto-never

If that's the main motivation I'd much rather have auto-never, because I'd much rather avoid safe users needing to add ! patterns in some matches and not others for hard-to-figure-out reasons (namely the presence of & on the path to an empty type).

I tried to separate this RFC from auto-never discussions but maybe they're too entangled for that 🤔.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the main motivation I'd much rather have auto-never, because I'd much rather avoid safe users needing to add ! patterns in some matches and not others for hard-to-figure-out reasons (namely the presence of & on the path to an empty type).

I agree that we need an auto-never equivalent... As you've said in the comment below, maybe we need to explore that first.

But I think a more flexible never pattern could theoretically be beneficial to people writing unsafe too. For example, in these quite hypothetical scenarios:

  • Maybe in their current circumstance they know that all branches that are unsafe are invalid, and want to avoid writing out lots of distinct branches. (e.g. they're using something like an unsafe Trees That Grow pattern, if such a thing would even make sense... in which case, they could use ! as a wildcard, like Ralf spelled out here. Although I'm not sure if that was intentional on Ralf's part, or accidental 🤷)
  • Or they know that some sub-enum should always be never, and want to assert that at compile time.

Whilst I'm naturally drawn to neat generalisations, I accept that the use case argument isn't super strong.

That said, (ignoring the implementation difficulty argument for now) I feel a good argument against a more general / flexible pattern should be motivated in why a restriction makes sense. For example:

  • Do we feel that allowing wildcard ! is a foot gun for a user to write the wrong thing? That they put ! as an easy fix because they don't understand the implications about it matching more variants than expected, or future new variants? (I'm not sure to what extent unsafe developers would benefit from such hand-holding)
  • Do we feel that allowing wildcard ! might be "less easy to understand" or less readable at code review time? Does a lone ! look weird or eat up too much strangeness budget?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I didn't realize that we'd need more than just this RFC for that case.

My immediate gut feeling is that the conservative choice here is to accept less code. It's east to accept more code in the future, it's impossible to take back accepting wildcard ! if we add it now. So I'd be inclined to first require the ! pattern to be used in a position where the type is actually uninhabited, and possibly change the "automatic never pattern" rules for matches on entirely safe places, and then revisit whether we really need wildcard !.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good argument. Start conservative, without precluding expanding in future if it's still useful 👍


In terms of both semantics and exhaustiveness checking, a `!` pattern behaves like a binding. E.g. a `Some(!),` arm (absent match ergonomics) is semantically equivalent to `Some(x) => match x {}`. Indeed, they both indicate a load at a place of an uninhabited type, followed by an unreachable body.

Never patterns interact with match ergonomics as expected: a `!` pattern is allowed on e.g. `&(T, !)`.
Copy link
Member

@RalfJung RalfJung Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected &(T, !) to be an uninhabited type (it is library-invariant-uninhabited, after all), so this sentence is odd.

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do go with the "safety invariant" version of inhabitedness then indeed this sentence isn't useful anymore. Will update when I get a moment.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Status quo alternative: we could do nothing, and consider that writing `Err(_) => unreachable!()` or `Err(x) => match x {}` is good enough. The drawback is more noticeable for or-patterns, e.g. `let (Ok(x) | Err(!)) = ...;` would have to become a `let .. else` or a full `match`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time the RFC mentions that let (Ok(x) | Err(!)) = ... is even possible. If that is part of the RFC it should be spelled out above. I guess never patters are exempt from the usual rule that all variants of an or-pattern need to bind the same variables?


The only other language I (Nadrieril) am aware of that has the intersection of features that would make this necessary is Zig. They are in the process of clarifying the semantics of their empty types (https://github.com/ziglang/zig/issues/15909); they may or may not end up encountering the same explicitness problem as us.

Ocaml and Adga have respectively [refutation cases](https://ocaml.org/manual/5.2/gadts-tutorial.html#s:gadt-refutation-cases) and [absurd patterns](https://agda.readthedocs.io/en/latest/language/function-definitions.html#absurd-patterns), both of which give a way to say "this pattern is impossible and I shouldn't have to write an arm for it". Their flavor of impossibility is related to types however, not runtime validity.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their flavor of impossibility is related to types however, not runtime validity.

I don't understand this remark. Ours is also all about types?

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yes this is badly phrased. I'm trying to express something like "in their case, the absurd pattern is purely a type-system hint, whereas in our case changing Err(_) to Err(!) can change the runtime semantics".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key difference is that Adga doesn't have unsafe code and OCaml doesn't talk about the unsafe code it has.

}
```

This RFC proposes a new `!` pattern that works as an explicit "access data" or "assert validity" operation for uninhabited types. The examples above become:
Copy link

@ijackson ijackson Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is the key point. It defines what the programmer means when they write !.

However, in safe code (eg, when the scrutinee is a safe reference), this fact - "this is actually sane" - is something that should always be true and shouldn't need to be explained by the programmer.

ISTM that this is also a pedagogical problem. If this is needed in safe Rust, auithors of that safe Rust will need an explanation what this thing is for and what it means, but we definitley don't want to burden them thoughts about validity/safety invariants etc. Nor do we want to suiggest to them in discussion of match that values might not be valid. But there is no way to explain this ! without supposing that &T might not reference a proper T. Authors of safe Rust are entitled to assume that &T always does reference a valid T.

Therefore, ! should not be required in safe Rust. What does that mean precisely? One possible answer:

  • ! is not required when dereferencing a safe reference, field, etc. (recursively)
  • Except, within an unsafe block, we emit a lint saying "something something uninhabited types validity something something requirement something"

Copy link
Member

@RalfJung RalfJung Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to make syntax depend on whether something is syntactically inside or outside an unsafe block. That kind of context-dependence can become quite surprising.

So if we do anything like this, IMO it should be fully based on the type of the scrutinee place (including the types of its subexpressions, e.g. whether a union field projection is involved).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But just to check, would you consider a possible path forward looking something like:

  • Never patterns match types with a certain property, independent of location (this property could be "safety-based uninhabitness").
  • When creating a match statement on a type with a different property, the compiler inserts a never pattern at the bottom of the match, independent of location (this property could be, e.g. "doesn't contain pointers or union field projections or partially uninhabited tuples" - roughly as per the auto-never blog post)).
  • We add a lint for when the auto-never might cause issues - which could be dependent on location?

I think such a lint being location-sensitive would be justifiable in that it doesn't change syntax, but just adds extra checks? Or at least Niko was happy with that in 2018.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the path forward I intend to propose is simply to allow omitting arms for &! and similar types (more generally, if the type is safely uninhabited). My understanding is that making unsafe references is a massive footgun anyway (because of e.g. auto(de)ref).

Unsafe code authors that do that anyway can write never patterns for such cases (they will not be marked as unreachable), and we may add a lint that requires them if users find this useful.

I have moved away from explaining things in terms of "auto-never" out of personal preference, but you could explain this by saying that we always auto-never except inside pointers and unions.

Never patterns match types with a certain property, independent of location

This is indeed precisely the intent of this here RFC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given ptr: *const &!, I don't think match *ptr {} should ever build. That should require match *ptr { ! }.

IOW, this should depend on whether the matched-on place is a safe place or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I include this in the exception that "we don't auto-never inside pointers". By "inside pointers" I mean in terms of places.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given ptr: *const &!, I don't think match *ptr {} should ever build. That should require match *ptr { ! }.

IOW, this should depend on whether the matched-on place is a safe place or not.

Okay, understood - so in your view auto-never behaviour (if/when we have it or an equivalent) would need to vary be place, and shouldn't just be a static property of the matched type.

But never patterns, as syntax, should be place invariant. This makes sense.

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may now stop discussion of auto-never and similar considerations: I'd like to keep the discussions on-topic of the never pattern syntax proposal, and discuss changes to which arms can be omitted etc to some other place. For example rust-lang/rust#131452.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- Does this cause problems with macros-by-example?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It also causes problems for derive macros. As I write,

[It] sets macro authors up for failure. Every macro-generated match statement needs to have a spurious ! arm, or the macro will fail to compile, sometimes, with an empty enum.

This isn't just a problem for macro_rules macros. Derive macros can suffer from it too. (Users of derive-deftly or other approaches for making derive macros more easily are quite likely to run into this.)

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two separate issues here: rust-lang/rust#131452 is entirely independent from this RFC, since this RFC is purely about adding a new pattern syntax (and not about changing anything else in the language). The other issue with macros is that a ! pattern doesn't take an arm; this is the point that's relevant to this RFC. That one I don't know how bad it is.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other issue with macros is that a ! pattern doesn't take an arm; this is the point that's relevant to this RFC. That one I don't know how bad it is.

I think it would be a breaking change for libraries like syn, in the sense that all procedural macro crates would need to update to the latest syn with a new AST to handle match patterns without arms - and this may require a major version bump of syn...

Given that syn is only on major version 2, it looks like such changes aren't very frequent - but maybe this isn't that much of a problem in practice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, they'd have to majorly change their Arm struct :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is already causing problems like:

fn h(a: Option<&!>) -> bool {
    matches!(a, Some(!))
    // error: a never pattern is always unreachable
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syn has to have some way of dealing with the Rust syntax evolving, I don't think its API choices should dictate our grammar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matches!(a, Some(!)) is kind of silly so arguably we should at the very least lint against it. Why not just write false instead?


Never patterns interact with match ergonomics as expected: a `!` pattern is allowed on e.g. `&(T, !)`.

A never pattern may be linted as "unreachable" if there is no well-behaved execution that can reach it. For example, if it was guaranteed that reading the discriminant of `Result<T, !>` could never return `Err`, the `Err(!)` pattern would be linted as unreachable. Never patterns are not otherwise linted as unreachable even when removing them would be accepted.
Copy link

@dhedey dhedey Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we view this as necessary? Under the psuedo-definition of ! in this comment, a never pattern against 0 types is trivially okay.

Some benefits of permitting an uneeded !:

  • It would be a win for macros (as per this comment), they could always include it without punishment.
  • It would make auto-never a trivial sugaring where we choose to accept it (just stick ! at the bottom of every match, which effectively operates as both a check that all remaining unmatched types have unsatisfiable safety invariants, and an assertion that they are all invalid)

A drawback could be that in this case, if we see !, we can't assume that there exists at least 1 leftover variant/type in the match with an unsatisfiable safety invariant. But I'm not sure to what extent that is actually useful information, worthy of linting for?

Copy link
Member Author

@Nadrieril Nadrieril Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is purely a convenient lint for users and helps learnability, exactly like today's "unreachable arm" lint. Macros can (and often do) silence lints to get more flexibility, so there'd be no win here. Similarly, the auto-never desugaring you have in mind would work the same; we wouldn't emit a lint there since that would make no sense.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

My comment was mostly applying to a world where we had wildcard !... Unfortunately without wildcard !, it's not valid to place a ! next to other variants, so a macro which outputs a match &self { /* ... */ } still needs to handle/branch on there being 0 variants, to decide whether to insert a ! or not. This is rather unergonomic - particularly in declarative macros.

All that said, I think it's fine for this RFC to not cover that, and to need to wait for something equivalent to auto-never for this to be a "nice" story for macro writers.

# Drawbacks
[drawbacks]: #drawbacks

The main drawback is that this new pattern is unlike other patterns (e.g. it doesn't require an arm body) which adds some maintenance burden. It's otherwise a simple feature.
Copy link
Member

@kennytm kennytm Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How bad would it be to this RFC if the "feature" of skipping the arm body be removed 🤷 Save some major changes to the syntax that messes with macros and syn etc.

Just type-check that the arm body always has type const ! to replace that "a never pattern is always unreachable" error.

match x {
    // note: using `const { unreachable!() }` instead of only `unreachable!()`
    // to tell the compile actually enforce that the arm is never invoked
    Some(!) => const { unreachable!() },
    None => "ok",
}

TBH it is very strange that only the match syntax has this omission feature. You can't do it in an if let for instance. Given how #![feature(never_patterns)] works today you need to write:

if let Some(!) = x {
    // you still need a body here! otherwise you get "E0308 mismatched types"!
    const { unreachable!() }
} else {
    "ok"
}

(I don't think the "empty block" proposal in #3719 (comment) works because an empty block does not indicate the block is actually unreachable. The ! expression is cute but I think it may conflict with unary-not !(false).)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other syntax that has an equivalent is or-patterns: Ok(x) | Err(!) is a valid pattern because the never case "doesn't count" for the purpose of ensuring that all alternatives have the same binding. Otherwise you're right, match arms are special.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't want the "bodiless arm" syntax, I'd propose that we allow this:

match x {
    Some(!) => {}
    None => "ok",
}

this would consider the whole arm body to diverge for the purposes of type-checking, similar to how we typecheck => { return; () } or => { panic!(); () }. In fact I already implemented something like this for function arguments:

enum Void {}
fn foo(!: Void) -> u32 {} // typechecks because the body diverges

There's one case where the "bodiless arm" is particularly nice though: the empty match. Today match *foo {} is allowed for foo: *const !, which violates the intent that matching on empty types inside pointers should be explicit. My intent after this RFC is to deprecate this case and eventually force users to write match *foo { ! } instead. This is less nice if they have to write match *foo { ! => {} }.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specialty of Ok(x) | Err(!) manifests only after parsing which you have built the AST which then you can check for "E0408 variable is not bound in all patterns". The impact is not at the same level as the proposed syntax match expr { pat(!) } that changes the grammar and also the resulting AST/HIR structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ! expression is cute but it think it may conflict with unary-not !(false).

how so? if you're trying to have a ! expression, you can just say it can't be followed by a function call or other ambiguous tokens, if you really want to call it, you'd have to write (!)(false) (not that that would type check...)

Copy link
Member

@programmerjake programmerjake Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this rather obscure feature is not a place we should spend the weirdness budget on.

ok, since {} can be made to type check, I think using ! => {} as the match arm is good enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the need for never-patterns is common enough to justify changing the match syntax just to save that => {} 😅

Fair. I was hoping we could allow it just for this special case but idk if that's worth it.

I suppose that means the if let example not working for {} is a bug?

Yep

Copy link

@dhedey dhedey Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My slight concern with a syntax like ! => {} is that it reads to me like "types which match ! are valid in this position, please ignore them and return ()" rather than "types matching ! are unreachable/UB".

I haven't thought too much about this, but would something like ! => ! type check? (if we temporarily ignore a dependency on the never type).

And would it possible to reject alternative arms somehow at compile time? e.g. require ! => ! or ! => { ! }, but prevent e.g. ! => { x = x + 1; panic!("Side effect!"); } which I would assume also has a type of !?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhedey => ! can certainly type-check but then we need to introduce the new "!" expression in additional to the pattern and type.

but prevent e.g. ! => { x = x + 1; panic!("Side effect!"); } which I would assume also has a type of !?

A non-empty block (or non-! value) should raise the unreachable_code lint like now:

// warning: unreachable statement
fn f(ptr: &mut u8, !: &!) {
//                 - any code following a never pattern is unreachable
    *ptr += 1;
//  ^^^^^^^^^^ unreachable statement
}

Copy link
Member

@RalfJung RalfJung Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to have an arm, IMO => ! makes most sense. It does still look quite funny, though...

EDIT: Oh, and it was already said above that it causes some parser issues. Hm.

@ijackson
Copy link

There are two separate issues here: rust-lang/rust#131452 is entirely independent from this RFC, since this RFC is purely about adding a new pattern syntax

I don't think I agree that these are separate. The RFC says that ! is needed "in unsafe contexts". This is entangled with #131452 and the discussion about the semantics of uninhabited types, as discussed here: rust-lang/rust#131452 (comment)

My own concern is primarily the possibility (which afaict is not explicitly excluded by this RFC text) that people writing safe code might be required to write ! arms. I think we must avoid that. That is one way this RFC is entangled with #131452.

Avoiding that means we need a distinction between safe and unsafe places. That distinction is also the thing that solves #131452 (which is about safe code). With that distinction this RFC will become clearer. It will also be more obvious what the alternatives are to never patterns (generally, I think, additional explicit dereferencing of the unsafe place?)

In any case, before we have fixed #131452, the handling of &Void in safe code (which is a prominent problem with Rust today) is going to keep turning up in the conversation, making it difficult to talk about even the aspects of the never patterns RFC that don't relate to safe code. Even if you consider the questions entirely unrelated, the apparent relationship between them makes the conversation confusing.

So I think we must deal with #131452 first.

@Nadrieril
Copy link
Member Author

I was trying to keep the conversations separate to make discussion easier, but it's becoming clear that this is not helping x) I'll go ahead with my plan for rust-lang/rust#131452 and come back to this afterwards.

@clarfonthey
Copy link
Contributor

The most confusing bit of this RFC for me is exactly how it interacts with safety.

Specifically, it doesn't clarify when the need would arise for match x { ! } versus match x {}. It doesn't even specify whether match x { ! } requires that the match be enclosed in an unsafe block, even though the examples in the RFC seem to imply this.

I get that the goal here is to just provide a syntax, but there are too many unanswered questions about the motivation to really feel like the syntax is justified. As folks have mentioned, the lack of an arm means that this now has to be covered separately in macros, although I think that never_pat as its own macro matcher could probably be fine.

Additionally, would you be expected to mix uninhabited patterns with inhabited patterns? What does this mean? How does this play into where the unsafe block is located? Doesn't this mean that the unsafe block is now placed farther away from the unsafety, not localised to the specific unsafe bit (the never pattern) but the entire match block?

I just don't really see why adding a specific never pattern is a good solution to any problem, and I also don't understand what problems require them.

@Nadrieril
Copy link
Member Author

It doesn't even specify whether match x { ! } requires that the match be enclosed in an unsafe block, even though the examples in the RFC seem to imply this.

A never pattern is not an unsafe operation, since you can't cause UB by using it on safe values. It's exactly as safe as the true pattern on booleans. So this requires no unsafe block.

The part that requires unsafe in these examples is the match scrutinee. We can't write match (unsafe { *ptr }) { .. } because that would have a different meaning: that would copy the value out of the pointed-to-place. To match on the place directly we have to write unsafe { match *ptr { .. }}. That's just how rust works today.

Specifically, it doesn't clarify when the need would arise for match x { ! } versus match x {}.

match <expr> { ! } is always allowed if the type allows. It's a pattern like any other. match <expr> {} is allowed when <expr> is not inside a pointer or union (that's not a change, that's how rust works today, though admittedly that changed last month). That's why the examples use unsafe: never patterns is mostly only useful around unsafe code.

Additionally, would you be expected to mix uninhabited patterns with inhabited patterns? What does this mean?

I don't know what you mean. You use patterns as usual, depending on the type you want to match on. It's just that for empty types there's now a new pattern you can use. You can mix and match as usual.

I just don't really see why adding a specific never pattern is a good solution to any problem, and I also don't understand what problems require them.

To be fair this is a rather subtle point of rust operational semantics. To rephrase the "motivation" section in the RFC: it solves the problem of having an explicit way to assert validity of data at a place of uninhabited type using a pattern.

@clarfonthey
Copy link
Contributor

clarfonthey commented Nov 16, 2024

I guess I should have clarified on the mixing: specifically, imagine the case like this:

match x {
    Ok(x) => /* ... */,
    Err(!),
}

and this:

match x {
    Err(!),
    Ok(x) => /* ... */,
}

And you can imagine more than two arms for more complicated examples. If we allow more than one never pattern, then our macro-matching situation gets more complicated, and you have to explicitly use tt-matchers to handle both cases, which can get a lot more complicated than just having multiple arm matchers. However, I can see value in allowing multiple never patterns separated by commas, compared to one never pattern separated by pipes, for convenience and to allow sorting patterns better.

Like, that is a rather sizeable complication to the syntax which needs to be explained, and should probably have style guidelines too.


Adding an additional example for clarity:

match x {
    Ok(None) => /* ... */,
    Ok(Some(!)),
    Err(None) => /* ... */,
    Err(Some(!)),
}

Note that since it's not just all inhabited patterns followed by an optional never pattern, and instead perfectly allowed mixing, you can't use a single-rule matcher for macros any more: it has to be a tt-matcher with separate rules for the different arm cases.

@clarfonthey
Copy link
Contributor

Adding a relevant bit here after discussing a bit in rust-lang/rust#131452: I really don't think that motivation is properly described in this RFC, at all. I understand the desire to remain brief, but there are a lot of very, very subtle details that do not feel clarified at all in the RFC. Even the "guide-level" explanation, which is supposed to be for people who don't understand these subtle details, uses very specific terminology like "place" that isn't properly described to the person reading.

Like, my understanding is that the opsem team would prefer to define pattern-matching such that reads happen on a per-arm basis, meaning that you can always look at the arms of a match to determine what reads are happening, and thus the points at which UB can occur if a reference were not well-formed. This avoids the need for a "discriminant read" operation, since you can rely on pattern-matching instead.

While syntactically matches of uninhabited types would generally be allowed to be left empty, with never patterns being added "automatically" by the compiler, semantically, the idea is that they would still be there, and still able to cause accesses to memory per the memory model. To help ensure that people don't get caught up by these "automatically added" arms, people can opt into using these patterns, either by denying a lint that checks for them or via a separate lint which will only fire when they are omitted in the presence of unsafe code, to be worked out later.

This RFC is attempting to sidestep the operational semantics and simply propose a new syntax but I, genuinely do not think that's a good idea. Because, well, the semantics do matter: match x {} or match x { ! } and match x { _ => unreachable!() } no longer have the same meaning, assuming that the implicit ! pattern is added to the end and not the beginning. And, with the proposed semantics, the unreachable!() branch is not dead code, and cannot be automatically removed by the compiler.

Like, now that I actually understand the motivation for this change, I really do get it. I just don't really know if the proposed solution is the correct one: since the semantics aren't really negotiable, that means either we add an explicit syntax for never patterns and recommend that they sometimes be used in unsafe code, or, we keep the semantics for these patterns being added implicitly and don't add any special syntax for them. I'm not going to argue whether the syntax is worth the downsides it brings, but I do think that the semantic side plays a lot into making that justification, and I don't think it's adequately stated here.

@Nadrieril Nadrieril removed the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants