-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rustc performs auto-ref when a raw pointer would be enough #73987
Comments
This is specifically about field projection of a raw pointer, right? |
I think so, yes. It is key that |
Oh I just realized something, and I think that the issue title and description are misleading. They make it sound like we’re calling trait Foo {
fn bar(self);
}
impl Foo for *const [u8] {
fn bar(self) {}
}
pub struct Test {
data: [u8],
}
pub fn test_len(t: *const Test) -> usize {
unsafe { (*t).data.bar() }
} Errors: Compiling playground v0.0.1 (/playground)
error[E0599]: no method named `bar` found for slice `[u8]` in the current scope
--> src/lib.rs:14:24
|
14 | unsafe { (*t).data.bar() }
| ^^^ method not found in `[u8]`
|
= help: items from traits can only be used if the trait is implemented and in scope
note: `Foo` defines an item `bar`, perhaps you need to implement it
--> src/lib.rs:1:1
|
1 | trait Foo {
| ^^^^^^^^^ Another example without involving a struct field: fn ptr_after<T>(x: &T) -> *const T {
(x as *const T).offset(1) // Ok
}
fn ptr_after2<T>(x: &T) -> *const T {
x.offset(1)
} Errors: Compiling playground v0.0.1 (/playground)
error[E0599]: no method named `offset` found for reference `&T` in the current scope
--> src/lib.rs:6:7
|
6 | x.offset(1)
| ^^^^^^ method not found in `&T` So I’d be inclined to call this not a bug. |
Yes, that is the problem. We should be calling the raw ptr method, but instead the compiler chooses the call the other method. That is what this bug is about. I am happy for suggestions for how to word this better. :) Elsewhere you wrote:
The example code also doesn't involve a I am aware that the reason for this is auto-ref, and not having auto-raw-ptr. But that is IMO a big problem as it means it is actually very hard to call raw- |
Indeed. IMO we should not stabilize any raw ptr method where a reference method with the same name exists, until this bug is fixed. It's just too much of a footgun, with people accidentally calling the reference method instead of the raw ptr method. |
I’m not sure I agree that this is a bug in the first place. The language never had coercion from |
Would you agree that it is a footgun, though? I agree it is behavior as intended. I just don't think the intentions are fit for modern Rust any more -- after all, when this behavior was designed, there were no raw- |
In this particular example, the behaviour does not feel like a footgun to me: In my simplified mental model of the language, I would go as far as saying that having |
That is not the case though. (What you said is basically right when
Well, This is the same in C: |
Wearing an old hat, Wearing a newer hat, since |
Not sure what the aliasing rules are "protecting" here? Stacked Borrows is adding strictly more Undefined Behavior, i.e. you could describe it as purely existing to break your code, of course with the goal of having the compiler make other code (the one that is not broken) go faster. Creating a raw pointer makes strictly fewer assumptions. IOW, when |
What about |
Ah, I was only thinking of cases where we current do auto-ref. As in, this is code we currently accept, but we are possibly doing the wrong thing (by creating a reference instead of a raw pointer). I am less sure about code that currently would fail to compile; I am not proposing to accept such code. |
Not sure either. I guess I thought that the borrow checker would somehow complain about obvious aliasing but it doesn't. Also, I thought |
It's effectively a cast from a raw ptr to a (shared) reference. |
Yes, but I thought there was another effect. (2 hours pass) Actually, no, that may be a lie! |
It does cast to a reference. But if the context asks for a pointer, it can be implicitly coerced back to a pointer again. Given that coercion, I actually wonder if there is any good reason for |
As someone asked on Zulip, I'd like to push this a bit. In particular, I'd like to have the answers to the following questions:
|
@NeoRaider this depends on what you mean by "valid". ;) The
Long-term, I think |
That is my concern, yes: the existence of these raw ptr methods might make people think that |
This runs counter to my intuition of how operations compose. If |
Hi NeoRaider and RalfJung, I was directed here from Zulip and was following along this conversation. If I may, I'd like to contribute some thoughts. Rust 1.51.0 stabilized the addr_of! macro and I wonder if this macro takes away any confusion that may arise. If the programmer can expect that
The compiler currently agrees with your intuition :-) In the above example an unsafe block is needed in order to be able to call I hope this is useful feedback from a Rust end-user. To me the availability of the |
Fair, I need to refine this a little. Notice that the notion of "subexpression" is non-trivial here: to understand how this composes, you need to take into account the fact that Rust distinguishes places and values (called "lvalues" and "rvalues" in C). In particular, there are implicit That would be a way to realize my proposal while remaining compositional (in a sense). Another alternative might be to say that I should also clarify that none of this is official in any way. Officially, |
Note that the macro is relevant not just for alignment, but also for cases where the pointer points to uninitialized data. Your snippet nicely demonstrates my concern: if the programmer, instead of |
Actually, according to the current UB rules, it is UB to
So, We should probably change that... discussion on Zulip |
Is this still considered an issue and a blocker for #71146? I also share the sentiment that casting the field to a
* unless By this intuition, Uninitialized Memory: Unsafe Rust is Too Hard and especially Rust's Unsafe Pointer Types Need An Overhaul are relevant here. See also |
My feeling is that having to write However, I don't have a very strong opinion here; ultimately this is up to @rust-lang/libs-api. I would also think that having |
I might know of an additional relevant and fun footgun that would need be solved before then 😄 What if
|
I guess I wasn't very clear in my point. I think that ultimately, having With the current status quo, the answer is always yes. The
It would make little sense for
In this scenario, we'd have the same problem as we currently have with To prevent these issues, I believe that place expressions should never be implicitly converted to pointers, since currently, they always become references or are copied by value. To project a pointer into a field, we should require the use of something explicit such as This is why I mentioned the latter article, which has a proposal for lightweight "path offset syntax" using |
I just had my opinion that this is a pretty bad footgun confirmed via #99437. |
IMO the lesson to be learned here is that if a place originates from a raw pointer, we should never implicitly turn it into a reference. For the cases where we already do, we should warn about this situation.
IOW, I vehemently disagree with this. A place expression that comes from a raw pointer should only ever be converted to a raw pointer, never to a reference. Converting it to a reference implicitly is much too likely to introduce aliasing requirements that were probably not intended. This must therefore only happen explicitly. |
While it would be nice if place-to-reference conversions were always explicit, such a thing cannot be implemented without horribly breaking backward compatibility. To list the implicit place-to-ref conversions we currently allow in stable code:
All of these can take
With this imbalance in mind, my usual approach would (IMHO) seem quite practical: assume that a |
The only "small" problem with this approach is that it is a false assumption. I don't think programmers will typically assume this. And I can't blame them, even with all I know, I wouldn't have assumed this. I started at the example that triggered #99437 for quite a while until I realized what was happening. So, while there are indeed many syntactic positions where place-to-ref conversions happen, I am not as sure as you seem to be that this happens regularly in unsafe code that actually intends to do this. That's why I think we should have a lint, that would help us detect these situations. Of the things you list, I also don't think But I think |
Put differently: I don't think defining the problem away by making it the programmer's problem is a constructive solution to this issue. Sure, it means us compiler devs can wash their hands, but it means more bugs in Rust code, and that's a bad outcome. We should strive to prevent those bugs instead. Our goal here isn't to be "technically correct" when things like #99437 happen, our goal is to make it easier to write reliable software and to prevent such issues from occurring. |
Hmm, I didn't know that, and I don't think it's at all obvious to an intermediate Rust user. Looking at the Reference, it carefully words its way around slices and arrays using
The docs for
The Rustonomicon also suggests that slice index operations go through
But as you describe, the example does not produce Overall, we don't give any solid indication that indexing operations have semantics other than those of (As for why I included |
If it were a value expression, then the ptr would be actually loaded from, so that would only have more UB. Therefore this is not a footgun. The problem is writing code where you don't want a value expression, like |
Hmm, you're right about that, now that I think about it. At the very top of the borrow stack, it's only an issue when you hold on to the
So we agree that there's lots of ways to get implicit refs from place expressions, and this can cause issues with the aliasing and non-null restrictions. You seem to argue that we should lint on these, or change the semantics so they operate via pointer. But I think that we should keep place semantics as they are, and steer users away from writing Right now, the predominant case where If we could eliminate those two cases, then users could simply avoid |
I mean, that'd be great, but it is a big change to the language -- much bigger than what I have the time for. So I dearly hope someone will pursue this. :) But meanwhile, I think there are smaller steps we can take that will help improve the situation, and those are the steps I am proposing. And even once we reach that goal, many people will still write |
…ith slice This potential source of UB was discovered while upgrading the Rust toolchain, which upgrades us to a new version of Miri with stricter rules around raw pointers. Specifically, an expression like `addr_of_mut!((*(ptr))[offset..])` is deliberately attempting to operate only on raw pointers while avoiding any intermediate references, since references have invariants that raw pointers do not. However, there is in fact an implicit reference here that is created as a result of the indexing operation. This is both surprising and not surprising, for interesting reasons. First, it should not be surprising because indexing is governed by the Index traits, whose methods function return references, so their presence here is natural. On the other hand, it is surprising because Rust already special cases `(*ptr)[foo]` when `ptr` is a raw slice and `foo` is not a range to avoid the Index traits entirely, which allows it to avoid emitting an intermediate reference. The ideal solution here is for Rust to be smart enough to not introduce the intermediate reference here at all, which is tracked at rust-lang/rust#73987 . In addition, while investigating this issue I brought it up to the Unsafe Code Guidelines team, who saw fit to file rust-lang/rust#99437 as a more specific example of the potential perils of the current behavior.
…ith range This potential source of UB was discovered while upgrading the Rust toolchain, which upgrades us to a new version of Miri with stricter rules around raw pointers. Specifically, an expression like `addr_of_mut!((*(ptr))[offset..])` is deliberately attempting to operate only on raw pointers while avoiding any intermediate references, since references have invariants that raw pointers do not. However, there is in fact an implicit reference here that is created as a result of the indexing operation. This is both surprising and not surprising, for interesting reasons. First, it should not be surprising because indexing is governed by the Index traits, whose methods function return references, so their presence here is natural. On the other hand, it is surprising because Rust already special cases `(*ptr)[foo]` when `ptr` is a raw slice and `foo` is not a range to avoid the Index traits entirely, which allows it to avoid emitting an intermediate reference. The ideal solution here is for Rust to be smart enough to not introduce the intermediate reference here at all, which is tracked at rust-lang/rust#73987 . In addition, while investigating this issue I brought it up to the Unsafe Code Guidelines team, who saw fit to file rust-lang/rust#99437 as a more specific example of the potential perils of the current behavior. Signed-off-by: bstrie <[email protected]>
… with range This potential source of UB was discovered while upgrading the Rust toolchain, which upgrades us to a new version of Miri with stricter rules around raw pointers. Specifically, an expression like `addr_of_mut!((*(ptr))[offset..])` is deliberately attempting to operate only on raw pointers while avoiding any intermediate references, since references have invariants that raw pointers do not. However, there is in fact an implicit reference here that is created as a result of the indexing operation. This is both surprising and not surprising, for interesting reasons. First, it should not be surprising because indexing is governed by the Index traits, whose methods function return references, so their presence here is natural. On the other hand, it is surprising because Rust already special cases `(*ptr)[foo]` when `ptr` is a raw slice and `foo` is not a range to avoid the Index traits entirely, which allows it to avoid emitting an intermediate reference. The ideal solution here is for Rust to be smart enough to not introduce the intermediate reference here at all, which is tracked at rust-lang/rust#73987 . In addition, while investigating this issue I brought it up to the Unsafe Code Guidelines team, who saw fit to file rust-lang/rust#99437 as a more specific example of the potential perils of the current behavior. Signed-off-by: bstrie <[email protected]>
… with range This potential source of UB was discovered while upgrading the Rust toolchain, which upgrades us to a new version of Miri with stricter rules around raw pointers. Specifically, an expression like `addr_of_mut!((*(ptr))[offset..])` is deliberately attempting to operate only on raw pointers while avoiding any intermediate references, since references have invariants that raw pointers do not. However, there is in fact an implicit reference here that is created as a result of the indexing operation. This is both surprising and not surprising, for interesting reasons. First, it should not be surprising because indexing is governed by the Index traits, whose methods function return references, so their presence here is natural. On the other hand, it is surprising because Rust already special cases `(*ptr)[foo]` when `ptr` is a raw slice and `foo` is not a range to avoid the Index traits entirely, which allows it to avoid emitting an intermediate reference. The ideal solution here is for Rust to be smart enough to not introduce the intermediate reference here at all, which is tracked at rust-lang/rust#73987 . In addition, while investigating this issue I brought it up to the Unsafe Code Guidelines team, who saw fit to file rust-lang/rust#99437 as a more specific example of the potential perils of the current behavior. Signed-off-by: bstrie <[email protected]>
I agree it's a trap, but at the same time if let tmp = *ptr;
tmp.data.len(); I'm already spooked by So having a dedicated operator for a raw deref ( |
It's not inconsistent since these are completely different programs! Let's think more carefully about places and values to make sense of all this. We have to make place-to-value coercions explicit; I will use Your code then becomes let tmp = __load(*__load(ptr));
len(&tmp.data); whereas pub fn test_len(t: *const Test) -> usize = unsafe {
len(&(*__load(t)).data) // the & being inserted here is exactly the problem
} IOW, by storing
That is also entirely explained by a proper treatment of places and values. There's nothing special going on. MiniRust defines both I do agree that we could do a lot better teaching this place/value stuff. |
You're looking at this from a very low-level perspective — no doubt technically correct one, but I mean it from more layman perspective. It requires having a more low-level mental model of the language. For novice Rust users used to higher-level/GC languages it's already weird that So I don't think that having a special raw-pointer-temporary deref would solve the surprising behavior, it'd just move it around. |
If you are using raw pointers, then I don't think you can avoid learning about places and values. (Believe me, it can get a lot more low-level than that. ;) The comparison with high-level/GC languages makes little sense since those languages don't have the features we are discussing here. For better or worse, Rust (like C and C++ but unlike, e.g., Java) is a place-based language, and it makes little sense to try and hide that fact from people that want to use low-level Rust features such as raw pointers. |
The following code:
generates MIR like
This means that a reference to
data
gets created, even though a raw pointer would be enough. That is a problem because creating a reference makes aliasing and validity assumptions that could be avoided. It would be better if rustc would not implicitly introduce such assumptions.Cc @matthewjasper
The text was updated successfully, but these errors were encountered: