Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for a match based surface syntax to get pointer-to-field #2666

Closed
wants to merge 10 commits into from

Conversation

HeroicKatora
Copy link

@HeroicKatora HeroicKatora commented Mar 22, 2019

Extend match syntax (*) and patterns (raw const, raw mut) by support for a limited set of operations for pointers, which involve only address calculation and not actually reading through the pointer value. Make it possible to use these matches to calculate addresses of fields even for repr(packed) structs and possibly unaligned pointers where an intermediate reference must not be created.

Rendered

This should be read as complementary surface syntax to the MIR changes proposed in #2582 . Now, somewhat coincidentally, the author @RalfJung decided to rewrite the other PR to also include a surface syntax. I was unaware of this parallel process . If desired, one could read the match/pattern syntax as complementary to the place expressions proposed there but since the idea was floating around for a while, I address the downsides in comparison to match in Rationale and Alternatives already.

/// * be properly aligned.
unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> {
let Weird { raw const a, ..} = w;
match core::ptr::read(a) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would this read a integer? Isn't a of type *const bool?

Copy link
Author

@HeroicKatora HeroicKatora Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right this needs a cast first, core::ptr::read(a as *const u8)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spotting, fixed it.

}
```

Note that pointer binding mode and pointer pattern requires `unsafe`, even when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There have been multiple references to these unsafe pointer patterns, but no examples or further explanations. What are they and why are they necessary? Aren't raw mut or raw const patterns sufficient?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointer patterns are the counter part of reference patterns, necessary for disambiguation in some special cases. They use * <subpattern>, paralleling & <subpattern> and pointer binding mode automatically adds the top-level pointer pattern, the same as the reference pattern implied by reference binding mode. I'll add an example showing the necessity but it boils down to being able to match a pointer by-value and its content with raw pattern, the former being necessary for backwards compatibility.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* `raw (const|mut) identifier`; allowed for field bindings and identifier bindings.
These are allowed in the grammar where `ref? mut? identifier` is allowed
currently. For this purpose `raw` is a contextual keyword.
* `* <subpattern>`; to match a pointer not by value but to additionally use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have this be a generic * pattern instead of separate *const, *mut patterns?

Copy link
Author

@HeroicKatora HeroicKatora Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, and it does seem better to stay consistent with &/&mut.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using *const/*mut isn't really consistent with &/&mut in a meaningful way, AFAICS. Patterns follow their corresponding expression syntax, and while the way to construct a &T/&mut T is &x/&mut x, the way to construct a *const T is not *const x.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So.....

match (0 as *mut usize) {
    &mut raw const z as *mut usize => (),
}

(definitely not serious).


It occurs to me that I mostly think about patterns following the type syntax rather than expression syntax, which is likely why I made this comment. It just works out that those are the same most of the time because of struct expressions following the type syntax.

Copy link
Author

@HeroicKatora HeroicKatora Mar 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @Nemo157 here, I had always assumed it followed type syntax due to struct-patterns and enum-patterns. (Although that is kind of a weak argument, I really can't say for sure why that felt more natural. Maybe because it doesn't support any operators?). Also, enum variants:

// current nightly makes this possible, with
// #![feature(type_alias_enum_variants)]
enum Foo {
    A,
}

impl Foo {
    fn test(self) {
        // gives various errors on different other rust versions.
        let Self::A = self;
    }
}

Edit: But that's kind of besides the point.

It's more consistent with reference patterns that way and there is no
inherent upside to not having to specify it that I know of.
/// * point to memory valid for the chosen lifetime `'a`
/// * be properly aligned.
unsafe fn get_if_init<'a>(w: *const Weird) -> Option<&'a Weird> {
let Weird { raw const a, ..} = w;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this auto-deref, or how does this typecheck?

Your example below with match (0 as *mut usize) does not look like auto-deref happens, but then shouldn't this be let *const Weird { raw const a, .. } = w?

Copy link
Author

@HeroicKatora HeroicKatora Mar 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match doesn't auto-deref in the ops::Deref sense of expressions. I'd rather view it as only adding reference patterns automatically where required, the implementation kind of agrees. I only suggest automatically wrapping into pointer patterns if required as well (and maybe only in pointer binding mode). Which pattern to add should never be ambiguous–in a top-down pass of the pattern after the rhs of the match is typed we know the required pattern kind–but probably deserves its own section anyways.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your example below with match (0 as *mut usize) does not look like auto-deref happens

Automatically adding reference for pointers has the slightly unfortunate side-effect of colliding with the ability to match pointers by const value (i.e. core::ptr::null()), which does not exist for references afaik. The example was intended to show that disambiguation should be backwards compatible but didn't show automatically adding the pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match doesn't auto-deref in the ops::Deref sense of expressions. I'd rather view it as only adding reference patterns automatically where required

I do not see how that makes any fundamental difference. The fact remains taht you can write a pointer deref and incur a memory access without ever typing *, such as in

fn foo(x: &bool) -> bool {
    match x {
        true => true,
        false => false,
    }
}

For raw pointers, we want to avoid this because accessing memory through them is unsafe and should only be done explicitly. In this sense, auto-deref on raw pointers and auto-&-pattern in match have the same concern.

This is not about ambiguity, this is about calling out to whoeever reads this code that a raw pointer is being dereferenced. I don't think this should compile:

fn foo(x: *const bool) -> bool {
    unsafe { match x {
        true => true,
        false => false,
    } }
}

Copy link
Author

@HeroicKatora HeroicKatora Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code would not work. While the pointer pattern is added automatically, it still does not allow any of the value-reading patterns to occur within it. So the value pattern true within in that example would desugar to *const true and then be denied on the grounds that a value pattern occurred inside a pointer pattern. I believe that the wording correctly captures this part of the mechanic but I could have misphrased it. Hence the distinction of auto-deref vs. adding a pattern, because the latter does not circumvent the pattern structure based checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would indeed fix this concern. It wasn't clear to me from the RFC; there should be examples for this kind of rule.

@RalfJung
Copy link
Member

I can see good arguments for having pattern syntax on top of explicit pointer-taking syntax, just like we do for references. However, it would be rather strange IMO to have only pattern syntax and not also a direct way to express this -- that would be very inconsistent with the handling of references. I also think that would hurt teachability, as you'd have no more direct way to express what happens in these patterns.

Pattern matching is already hugely complicated with guards and their evaluation order concerns, implicit derefs, handling of empty types, and so on. It is arguably the most complex language construct, and in fact it is the main reason why I pretty much refuse to do anything formal about surface Rust or the HIR -- I prefer working on the MIR where pattern matching has been almost entirely lowered away. At least most of the things it does can be explained as being equivalent to other language constructs: everything except for actually determining the discriminant of an enum. This RFC would add "create raw pointers" as the second such match-only operation. I do not think that is a good idea.

In that light I find myself disagreeing with the arguments in the "Rationale and alternatives" section:

Place expressions are overloaded with auto-deref, custom indexing (core::ops::Index/core::ops::IndexMut), invoking arbitrary user code. A solution with place syntax needs to explicitely forbid these forms of place statements, both to disallow user code and avoid accidental reference intermediates. The new statements thus resembles a very different other statement.

There is nothing new about this, we already do not do auto-deref for raw pointers. The custom code entry points you mention all have to do with using, not creating pointers. Adding a new way to create (raw) pointers (not subject to auto-deref) does not interact with them in any interesting way, from what I can tell.

Also, we do not auto-deref for raw pointers in match either. Do you suggest we do? If yes, it will have to be restricted to raw patterns, and at that point you have the same almost-but-not-quite-the-same-as-references situation as place expressions do.

Do you have examples for what you are worried about here?

The initial dereferencing of the pointer necessary for a place expression (struct.field is implicitely (*struct).field for a reference argument struct) will not work with pointer arguments, which do no automatically dereference even in unsafe code (and arguably should not, outside &raw).

This is an argument for a ->-like operator, or maybe doing auto-deref in more situations (like when no actual access happens). I do not see how "limited raw pointer auto-deref" in match is any easier to understand than in place expressions.

Also, &raw mut (*x).field is still shorter and IMO more readable than match x { Struct { raw mut field, .. } => field }, even though the pattern uses auto-deref and the place expression does not. I expect in many cases these pointers will be used only once, so they should be ergonomic to use even without a let-binding.

It provides a clear pattern that extends to enum fields in packed structs, which are not absolutely not expressible in place syntax.

You can totally use &raw to take the address of a field of enum type. I assume you are talking about taking pointers into enums, but there a pattern matching syntax does not help either: you can only match on an enum after you initialized it, at which point you might as well create a reference and not just a raw pointer.

Or maybe I am misunderstanding what your point is here. An example would help.

@HeroicKatora
Copy link
Author

HeroicKatora commented Mar 31, 2019

Also, &raw mut (*x).field is still shorter and IMO more readable than match x { Struct { raw mut field, .. } => field }, even though the pattern uses auto-deref and the place expression does not. I expect in many cases these pointers will be used only once, so they should be ergonomic to use even without a let-binding.

Maybe it should be pointed out specifically that it works for irrefutable bindings perfectly fine, and both pattern let and match take a place-expression as their input, not a value expression. Although it does not retain the auto-deref clarity for the operand, it is fine to use a pattern binding let ref _ = <place> or a match expression as an alternative to & <place> for getting a reference as well.

// Effectively same as `let r = &x.field`.
let ref r = x.field;

// Proposed pointer variant, no reference to potentially packed-struct-field `field`
let raw const ptr_to_field = x.field;

Edit: I've updated the introductory example to make use of this and explained its validity in the reference level section.

@HeroicKatora
Copy link
Author

HeroicKatora commented Apr 1, 2019

I can see good arguments for having pattern syntax on top of explicit pointer-taking syntax, just like we do for references. However, it would be rather strange IMO to have only pattern syntax and not also a direct way to express this -- that would be very inconsistent with the handling of references.

Also in light of let ptr const field = , I generally agree with this and would also like an expression. However, I expect syntax questions to be much less controversial here¹. It is pretty clear which existing patterns to mirror in the design, mostly the keyword may be different. And, since you note that match and expr syntax should be complementary, I see no reason not to adapt the more straightforward one first when they achieve very similar things with the same basic means. Previously, tuples also had their expression extractors (with integer fields) added after pattern matching was possible.


¹ Short summary for the problems with pointer-from-place in expressions: In match, references are matched by reference patterns, and optionally inferred, for which they use the type sigils. The operation of creating one has a single purpose pattern with a contextual keyword. In expressions however, the type sigil for pointers is already in use for dereferencing but the reference sigil is reuse for reference taking. Also, since the reference wraps the place expression, safety and ergonomic concerns must be balanced when considering if the place expression must be valid regardless of such a wrapping, as taking out the place-subexpression results in stricter requirements on the validity of the involved memory than when it occurs inside an imagined raw-ptr-expression.

@dtolnay dtolnay added the T-lang Relevant to the language team, which will review and decide on the RFC. label Apr 4, 2019
@scottmcm
Copy link
Member

scottmcm commented Sep 2, 2020

We discussed this in a @rust-lang/lang triage meeting today. The overall feeling was that this was potentially interesting, but a fairly large change to patterns, so we'd like to wait on more experience with ptr::raw! before looking at this more closely.

@rfcbot fcp postpone

@rfcbot
Copy link
Collaborator

rfcbot commented Sep 2, 2020

Team member @scottmcm has proposed to postpone this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Sep 2, 2020
@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Oct 19, 2020
@rfcbot
Copy link
Collaborator

rfcbot commented Oct 19, 2020

🔔 This is now entering its final comment period, as per the review above. 🔔

@dlight dlight mentioned this pull request Oct 23, 2020
@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this RFC. label Oct 29, 2020
@rfcbot
Copy link
Collaborator

rfcbot commented Oct 29, 2020

The final comment period, with a disposition to postpone, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC is now postponed.

@rfcbot rfcbot added to-announce postponed RFCs that have been postponed and may be revisited at a later time. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. disposition-postpone This RFC is in PFCP or FCP with a disposition to postpone it. labels Oct 29, 2020
@rfcbot rfcbot closed this Oct 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
finished-final-comment-period The final comment period is finished for this RFC. postponed RFCs that have been postponed and may be revisited at a later time. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants