-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC for a match based surface syntax to get pointer-to-field #2666
Conversation
text/0000-pointer-match.md
Outdated
/// * be properly aligned. | ||
unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> { | ||
let Weird { raw const a, ..} = w; | ||
match core::ptr::read(a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would this read a integer? Isn't a
of type *const bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right this needs a cast first, core::ptr::read(a as *const u8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for spotting, fixed it.
} | ||
``` | ||
|
||
Note that pointer binding mode and pointer pattern requires `unsafe`, even when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There have been multiple references to these unsafe pointer patterns, but no examples or further explanations. What are they and why are they necessary? Aren't raw mut
or raw const
patterns sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointer patterns are the counter part of reference patterns, necessary for disambiguation in some special cases. They use * <subpattern>
, paralleling & <subpattern>
and pointer binding mode automatically adds the top-level pointer pattern, the same as the reference pattern implied by reference binding mode. I'll add an example showing the necessity but it boils down to being able to match a pointer by-value and its content with raw
pattern, the former being necessary for backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oli-obk New section at the beginning of https://github.com/HeroicKatora/rfcs/blob/pointer-match/text/0000-pointer-match.md#reference-level-explanation to explain all of this
text/0000-pointer-match.md
Outdated
* `raw (const|mut) identifier`; allowed for field bindings and identifier bindings. | ||
These are allowed in the grammar where `ref? mut? identifier` is allowed | ||
currently. For this purpose `raw` is a contextual keyword. | ||
* `* <subpattern>`; to match a pointer not by value but to additionally use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to have this be a generic *
pattern instead of separate *const
, *mut
patterns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, and it does seem better to stay consistent with &
/&mut
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using *const
/*mut
isn't really consistent with &
/&mut
in a meaningful way, AFAICS. Patterns follow their corresponding expression syntax, and while the way to construct a &T
/&mut T
is &x
/&mut x
, the way to construct a *const T
is not *const x
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So.....
match (0 as *mut usize) {
&mut raw const z as *mut usize => (),
}
(definitely not serious).
It occurs to me that I mostly think about patterns following the type syntax rather than expression syntax, which is likely why I made this comment. It just works out that those are the same most of the time because of struct expressions following the type syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @Nemo157 here, I had always assumed it followed type syntax due to struct
-patterns and enum
-patterns. (Although that is kind of a weak argument, I really can't say for sure why that felt more natural. Maybe because it doesn't support any operators?). Also, enum variants:
// current nightly makes this possible, with
// #![feature(type_alias_enum_variants)]
enum Foo {
A,
}
impl Foo {
fn test(self) {
// gives various errors on different other rust versions.
let Self::A = self;
}
}
Edit: But that's kind of besides the point.
It's more consistent with reference patterns that way and there is no inherent upside to not having to specify it that I know of.
/// * point to memory valid for the chosen lifetime `'a` | ||
/// * be properly aligned. | ||
unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> { | ||
let Weird { raw const a, ..} = w; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this auto-deref, or how does this typecheck?
Your example below with match (0 as *mut usize)
does not look like auto-deref happens, but then shouldn't this be let *const Weird { raw const a, .. } = w
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match
doesn't auto-deref in the ops::Deref
sense of expressions. I'd rather view it as only adding reference patterns automatically where required, the implementation kind of agrees. I only suggest automatically wrapping into pointer patterns if required as well (and maybe only in pointer binding mode). Which pattern to add should never be ambiguous–in a top-down pass of the pattern after the rhs of the match is typed we know the required pattern kind–but probably deserves its own section anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your example below with match (0 as *mut usize) does not look like auto-deref happens
Automatically adding reference for pointers has the slightly unfortunate side-effect of colliding with the ability to match pointers by const value (i.e. core::ptr::null()
), which does not exist for references afaik. The example was intended to show that disambiguation should be backwards compatible but didn't show automatically adding the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match doesn't auto-deref in the ops::Deref sense of expressions. I'd rather view it as only adding reference patterns automatically where required
I do not see how that makes any fundamental difference. The fact remains taht you can write a pointer deref and incur a memory access without ever typing *
, such as in
fn foo(x: &bool) -> bool {
match x {
true => true,
false => false,
}
}
For raw pointers, we want to avoid this because accessing memory through them is unsafe and should only be done explicitly. In this sense, auto-deref on raw pointers and auto-&
-pattern in match have the same concern.
This is not about ambiguity, this is about calling out to whoeever reads this code that a raw pointer is being dereferenced. I don't think this should compile:
fn foo(x: *const bool) -> bool {
unsafe { match x {
true => true,
false => false,
} }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code would not work. While the pointer pattern is added automatically, it still does not allow any of the value-reading patterns to occur within it. So the value pattern true
within in that example would desugar to *const true
and then be denied on the grounds that a value pattern occurred inside a pointer pattern. I believe that the wording correctly captures this part of the mechanic but I could have misphrased it. Hence the distinction of auto-deref
vs. adding a pattern, because the latter does not circumvent the pattern structure based checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would indeed fix this concern. It wasn't clear to me from the RFC; there should be examples for this kind of rule.
I can see good arguments for having pattern syntax on top of explicit pointer-taking syntax, just like we do for references. However, it would be rather strange IMO to have only pattern syntax and not also a direct way to express this -- that would be very inconsistent with the handling of references. I also think that would hurt teachability, as you'd have no more direct way to express what happens in these patterns. Pattern matching is already hugely complicated with guards and their evaluation order concerns, implicit derefs, handling of empty types, and so on. It is arguably the most complex language construct, and in fact it is the main reason why I pretty much refuse to do anything formal about surface Rust or the HIR -- I prefer working on the MIR where pattern matching has been almost entirely lowered away. At least most of the things it does can be explained as being equivalent to other language constructs: everything except for actually determining the discriminant of an enum. This RFC would add "create raw pointers" as the second such match-only operation. I do not think that is a good idea. In that light I find myself disagreeing with the arguments in the "Rationale and alternatives" section:
There is nothing new about this, we already do not do auto-deref for raw pointers. The custom code entry points you mention all have to do with using, not creating pointers. Adding a new way to create (raw) pointers (not subject to auto-deref) does not interact with them in any interesting way, from what I can tell. Also, we do not auto-deref for raw pointers in Do you have examples for what you are worried about here?
This is an argument for a Also,
You can totally use Or maybe I am misunderstanding what your point is here. An example would help. |
Maybe it should be pointed out specifically that it works for irrefutable bindings perfectly fine, and both pattern
Edit: I've updated the introductory example to make use of this and explained its validity in the reference level section. |
Also in light of ¹ Short summary for the problems with pointer-from-place in expressions: In match, references are matched by reference patterns, and optionally inferred, for which they use the type sigils. The operation of creating one has a single purpose pattern with a contextual keyword. In expressions however, the type sigil for pointers is already in use for dereferencing but the reference sigil is reuse for reference taking. Also, since the reference wraps the place expression, safety and ergonomic concerns must be balanced when considering if the place expression must be valid regardless of such a wrapping, as taking out the place-subexpression results in stricter requirements on the validity of the involved memory than when it occurs inside an imagined raw-ptr-expression. |
We discussed this in a @rust-lang/lang triage meeting today. The overall feeling was that this was potentially interesting, but a fairly large change to patterns, so we'd like to wait on more experience with @rfcbot fcp postpone |
Team member @scottmcm has proposed to postpone this. The next step is review by the rest of the tagged team members: No concerns currently listed. Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to postpone, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. The RFC is now postponed. |
Extend match syntax (
*
) and patterns (raw const
,raw mut
) by support for a limited set of operations for pointers, which involve only address calculation and not actually reading through the pointer value. Make it possible to use these matches to calculate addresses of fields even forrepr(packed)
structs and possibly unaligned pointers where an intermediate reference must not be created.Rendered
This should be read as complementary surface syntax to the MIR changes proposed in #2582 . Now, somewhat coincidentally, the author @RalfJung decided to rewrite the other PR to also include a surface syntax. I was unaware of this parallel process . If desired, one could read the match/pattern syntax as complementary to the place expressions proposed there but since the idea was floating around for a while, I address the downsides in comparison to match in
Rationale and Alternatives
already.