-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manual does not define "data reached through a shared reference". #30424
Comments
/cc @rust-lang/lang |
Wow. This looks like a big issue in my model of aliasing, but maybe it is not so scary. The problem here is that we want locals to be essentially "always borrowed by the containing function", and IIRC we already perform optimizations that use that (so your code should probably be UB). On the other hand, we do need to have some way of converting mutable references to raw pointers and back again. I think that is the reason why @thestinger wanted an access-based, rather than an instant-death-based, aliasing model. An access-based aliasing model basically says that a valid reference that is dereferenced and used must not be incompatibly accessed - by raw pointers, other mutable references, modification of the page table, DMA by random peripherals, etc. - for its entire live lifetime. An instant-death-based model is similar but does not include the "that is dereferenced and used" part. LLVM also has some fancy rules on pointer comparisons, but we can't feasibly support them as we allow taking the addresses of arbitrary things in safe code. The problem with the access-based model is that it is problematic with "spurious" accesses (e.g. lifting accesses out of a loop), so we probably need some synthesis of the models - maybe using instant-death in functions without A few examples for the mutable-references-to-raw-pointers-and-back-again issue, for exposition:: fn f(&self) -> &Self { self }
fn f_mut_unsure<'s>(&'s mut self) -> &'s mut Self {
let ret = unsafe { transmute::<&Self, &'s mut Self>(self.f()) };
ret // β-expansion should have no effect
}
fn f_mut_should_work<'s>(&'s mut self) -> &'s mut Self {
let ret = unsafe { &mut *(self.f() as *const Self as *mut Self) };
ret // β-expansion should have no effect
} There are basically 2 aliasing issues there:
Under "instant death" rules, both are UB, so we can't have that. Under the access-based rules, both are fine. We had a proposal flying around that |
Of course the reason we want some kind of softer rules for |
Technically, a more explicit version of the above would be fn f_mut_pedantic<'s>(&'s mut self) -> &'s mut Self {
let captured_self = self as *mut Self;
unsafe { &mut *((*captured_self).f() as *const Self as *mut Self) }
} which inactivates |
cc @RalfJung |
Which optimizations would/could that be? I don't think I'm entirely understanding the significance of choosing an "aliasing model", or actually what an "aliasing model" is. I suppose it is related to Disclaimer: So far, the LLVM |
The aliasing rules basically say that the compiler can assume that nobody is modifying its memory behind its back (this also includes calls to external functions and the compiler's own pointer writes). This is important for optimizations like copy elimination and store-to-load forwarding. For example, if we have #[derive(Debug,Copy,Clone)]
struct Foo {
bar: Bar
}
#[derive(Debug,Copy,Clone)]
struct Bar {
// lots of data
}
fn doit() -> u32 {
let foo = Foo { bar: make_bar() };
debug!("{}", &foo);
match foo {
Foo { bar } => frobnicate(&mut bar)
};
} In a case like this, we would like to make |
Thanks for the example! I suppose at some point I will have to mask for more examples of copy elimination and store-to-load forwarding, just to get an idea what it would take to formally justify these transformations. In this case however, I am puzzle. Please excuse me being so slow ;-) . |
Because we want to treat pointers as "just integers", they don't have a lifetime associated with them if you do things like convert them to a raw pointer and back. This is required for |
What is The pointer passed to My approach to checking validity of unsafe code is not at all about aliasing, so I don't have to think about how pointers are represented here. (Also, I am pretty sure that unsafe Rust cannot pretend pointers are just integers, because that's not what LLVM does. See https://internals.rust-lang.org/t/comparing-dangling-pointers/3019.) There is some relation to aliasing, of course. In particular, permissions are passed around in a linear way, i.e., they cannot be duplicated. The permission associated with a mutable reference is such that it is impossible for anyone else to have the same permission (or the "shared reference" permission) for the same location. From having the "mutable reference" permission for pointer |
I was quite convinced that all raw pointers to the same address are pretty much equivalent. |
Maybe a model with equal raw pointers being equivalent does not allow desirable optimizations so we need a different one. |
One of the functions from the previous example, say: fn f_mut_pedantic<'s>(&'s mut self) -> &'s mut Self {
let captured_self = self as *mut Self;
unsafe { &mut *(Self::f(&*captured_self) as *const Self as *mut Self) }
} We basically want them to be defined. This means we can't "precisely" track borrows, at least in functions with unsafe blocks (e.g. in the latter function,
We must be using different definitions of "aliasing". The definition I (and I think LLVM) uses is that accessed addresses in an object pointed to by a Rust's definition is pretty similar - an accessed field of a reference can't be incompatibly accessed while the reference is active (that's it, live and not-incompatibly-borrowed). The problem with only having Instead, in Rust I think we should say that locals are basically implicitly borrowed by the owning function (they are already treated as such by the borrow checker/safety proof) which would also allow for this kind of reasoning.
That is not the way we decided to go about with unsafe Rust. We try not to have a hardcoded notion of "permissions" - people come up with pretty exotic ways to split array indices. There is some system of permissions that gives safe Rust its safety proof, but unsafe code is free to use a different system without risking UB. And even if that system is incompatible with the real one (e.g. an unrestricted "write-what-where" primitive, or a safe |
I am not sure what you mean by "equivalent" here. Permissions are entirely independent of the actual values processed on the machine, so one function could work on a raw pointer with some permissions, and another function could work on the same raw pointer and have no or different permissions. Permissions are associated with types, and the raw pointer type carries no permission. Of course, unsafe functions can always declare that they want more (or less, or different) permissions than what the types promise.
There has to be some coherent notion of permissions associated with the basic Rust types that is common to all code. This is a necessary condition for soundness. After all, we'd like to take all of these safely encapsulated, but unsafely implemented pieces of Rust and use them in the same program. This is only possible if they all agree on what the basic types mean, which permissions they grant. Of course, the permission associated with shared borrows will be a very flexible one, mirroring the fact that programs have lots of flexibility what they do with their shared borrows - just think of
This function looks fairly harmless to me. All these pointer casts do not actually do anything, in the end, all that happens here is that Of course, the proof sketch above entirely ignores the fact that Rust adds In particular, I noticed that
So, there actually is a clear point where "freezing" starts. (I think of there being explicit machine instructions to freeze and unfreeze a block of memory. They would be called whenever a shared borrow starts or ends, which should be statically known. Then whenever we write to a location, we can trigger UB if it is frozen. This explains why you can't just transmute a shared borrow to a mutable one, even in unsafe code. This freezing would of course not happen for An interesting question here may be - and maybe that's what some of your other examples are about - is what would happen if there is no explicit
Well, unfortunately I have no idea how to formally capture the notion of aliasing you are using here. As far as I know, nobody has figure out what
How does the OP's code relate to Also, regarding |
Ah yes, now that I realized you may be talking about interior mutability and how to allow it only within So, coming to your other two examples: fn f_mut_unsure<'s>(&'s mut self) -> &'s mut Self {
let ret = unsafe { transmute::<&Self, &'s mut Self>(self.f()) };
ret // β-expansion should have no effect
}
fn f_mut_should_work<'s>(&'s mut self) -> &'s mut Self {
let ret = unsafe { &mut *(self.f() as *const Self as *mut Self) };
ret // β-expansion should have no effect
} Putting my "freeze/unfreeze" glasses on (I have no idea if they make any sense, but it's the only operational take I was able to get on this so far), my question is where these (virtual) freeze and unfreeze instructions are emitted. I think both functions are kind of the same here. In both functions, the
This one, however, would not: fn f_mut_bad<'s>(&'s mut self) -> &'s mut Self {
let ret = unsafe { transmute::<&'s Self, &'s mut Self>(self.f()) };
ret // β-expansion should have no effect
} Notice that this forces the lifetime of And then there is the functions that use casts to obtain the argument of fn unsafe cast_mut_away<T>(x: &mut T) -> &T { transmute(x) } // make sure transmute does not change the lifetime
fn f_mut_dunno<'s>(&'s mut self) -> &'s mut Self {
let local = &mut *self; // just to get a local re-borrow with a lifetime shorter than the function body
unsafe { &mut *(cast_mut_away(local).f() as *const Self as *mut Self) }
} The argument to @arielb1, how do these functions look to you? Coming back to the OP, pub unsafe fn a() -> u8 {
let mut x = 11;
b(&x as *const u8 as *mut u8);
x
}
unsafe fn b(x: *mut u8) {
*x = 22;
} My central question here would be, what is the lifetime of the borrow created at |
I think that you are missing the point about the difference between UB and the Safety Proof. These are quite different things. UB is the set of things that rustc/LLVM is allowed to assume not to happen. Invoking UB, even in unsafe code, gives rustc/LLVM a Carte Blanche to rewrite your code - just the same as C. Unlike C, Rust (at least in theory) has a safety proof. The safety proof says that as long as all unsafe code in the program behaves according to some rules, then any additional safe code can't cause UB. It is perfectly fine to have a program that plays loose with the safety proof - for example, if you wrap a buggy C library with a safe interface, then calling it with the appropriate arguments will cause it to execute some rather arbitrary code, which is obviously UB. However, if, at run-time, the library is not called with these malformed arguments, no UB is involved. The rules the Safety Proof demands on unsafe code are purely behavioural, which means that unsafe code may rely on arbitrarily complex predicates in order to satisfy them (including the body of some "trusted" safe functions). The rules of UB are basically "syntax-directed", but are evaluated at run-time. The question of whether a program execution-trace invokes UB should be easily decidable (and require no proof or "ghost" permissions not apparent in the code).
That is basically correct, but I am quite sure that LLVM does not emit |
Rust's lifetime inference always infers the minimum lifetime possible (of course, the semantics only "run" on fully elaborated code) so we should be fine. We agree on The problem with
About |
I see. There is of course a relation between the two, since the safety proof has to show that there is no UB. But I was probably too focused on how to prove absence of UB - indeed for the "no mutation through a shared reference" part, even defining UB is an open problem. (The same goes for "mutable references and pointers derived from them do not alias", btw.) Btw, you mentioned the LLVM notion of a "derived pointer" above. I tried to find a definition of that for LLVM, but could not find anything like it. Does a pointer loaded from memory through another pointer count as "derived"? The LLVM docs http://llvm.org/docs/LangRef.html#noalias-and-alias-scope-metadata only explain
I don't think this can be done without adding any ghost state - though that ghost state would be somewhat different from the kind of permissions I mentioned earlier. For example, some the C rules concerning address arithmetic also apply in unsafe Rust; to explain them, our model of the memory cannot be just "mapping numbers to bytes". And I think the same goes for "no mutation through a shared reference, except for That's what I had in mind with my "freezing" proposal: Imagine every location in memory has an additional bit, "frozen", such that writing to a frozen location is UB. We could then imagine having explicit operations that freeze and unfreeze memory, which are called whenever a shared borrow starts or ends. When a function has arguments that are shared references, it would start by asserting that they are frozen - again invoking UB if they are not. This would then justify re-ordering reads from that location. I think ultimately we will need a model that actually uses some state in the memory to track things like this. The trouble with the aliasing-based models I saw so far is that they involve somehow looking at all the pointers that exist in all functions, and that are not borrowed away. (Notice that this, too, requires some "operational effect" associated with borrows starting and ending, visible on the machine that detects UB.) I don't like looking at other functions' state, or somehow having a global register of "all pointers that exist", and I think that would be rather hard to formally define. You mention "access-based" vs. "instant-death based" models above; I definitely think that access-based should be preferred. Having UB just from certain unused pointer values lying around is fairly surprising - and an access-based model lends itself much more natural to some extra tracking happening in the global memory, for every location, with checks being performed on every access. More checks could be easily added if we want certain guarantees even for unused pointers, like the "assert this memory is frozen" I suggested above.
You mean, having a More in general though... maybe the "freezing with a version number" can help. We could check, when a function starts, that Btw, I noticed that Rust only emits static zz: i32 = 13;
fn foo(x: &i32, y: &i32) -> i32 { let z = &zz; *z + *x + *y } I could not find any annotation for |
Permissions on pointers are not really going to work with FFI, so they are a non-starter. I suppose that the "freezing" of
Anyway, in my view, an active (I need to decide on terminology: live vs. active) One useful consequence of this, is that if some Maybe this can be modeled as (1) fn traverse(l: &mut MyList)
{
let mut cur = l;
loop {
if let &mut MyList::Cons(42, _) = cur {
break;
}
cur = match {cur} {
&mut MyList::Nil => return,
&mut MyList::Cons(_, ref mut l) => &mut *l
}
}
// here `l` is active, except for some unknown child of `cur`
if let &mut MyList::Cons(ref mut a, _) = cur {
*a += 1;
}
} |
For what lifetime will it be considered reborrowed, when it is transmuted? fn bar(x: &mut i32) { ... }
fn foo(x: &mut i32) { bar(x); bar(x); } it infers the lifetime of the calls to
Why should |
We probably need some way to make rustc not reborrow like that. |
I think that whatever rules we settle on when it comes to |
The problem is that one of the points of |
But @nikomatsakis I am not entirely following you... but if I understand it correctly that the compiler assigns minimal lifetimes and always reborrows, then this should be sound (and rustc says it is... which leaves me slightly shocked^^):
In other words, the compiler would actually not see any ownership transfer happening when casting to a raw pointer. This is somewhat consistent (after all, the compiler assigns no ownrship to mutable borrows), but I think it is also dangerous and usually not what people "mean". Above, you'd have to be very careful what you do with
Now the reborrow happens in the assignment to However, I think there's not just "linting" problems with the current behavior:
and I assume this code is intended to have defined behavior. But a model that actually ensures that mutable borrows are the only way to write to the data they point to, would assign UB to the program above: At the beginning of the This code is probably okay for LLVM because it will consider |
My point is that I believe it should, in some cases, be legal to transmute/cast an Put another way, I don't want the results of what is legal to have anything to do with what is inferred by the compiler within a fn definition. I want them to be based on the formal types of the arguments (which are explicitly written) and some kind of "common sense" reasoning. This may require the compiler to be more conservative than it would otherwise be in functions whose bodies contain coercions from For example, in this snippet from @RalfJung: fn foo(x: &mut T) {
let y = x as *mut _; // reborrows x for the shortest possible lifetime, which is this single statement
let z = x as *mut _; // does the same again
} And another related example that @alexcrichton and I discussed a long time back: fn caller(x: &mut T) {
caller(x, y); // what lifetime is inferred for the (implicit) coercion? does it cover the call?
}
fn callee(x: *mut T, y: *mut T) {
} In both these cases, as @RalfJung observed, the compiler will infer an awfully (and unrealistically) short borrow. But clearly the user expects to use the pointers One can imagine then a rule that says something like this: if the fn body contains a coercion or cast to Anyway, I have to run, so I have no time to cleanup this text just now, and I know that is not a "memory model". It's also coming from a mildly different perspective (what the compiler will do), which in principle ought to be "derived from" the memory model. But of course in practice most memory models are aimed at formalizing what the compiler just does. Hopefully this comment nonetheless helps to elucidate my thinking a bit. |
Luckily, the |
I am slightly worried about the part where it seems like we have to detect whether a function perform a borrow-to-ptr-coercion, and then let the entire function behave differently. I think this should be more local to the pointer, like - the moment you coerce a mutable borrow to a raw pointer, until , all aliasing for that particular pointer is fine. But I do agree that this is probably the behavior most people would expect. Semantically speaking, I don't think "until the end of the function" would be a good boundary. I would prefer something that is more tied to, e.g., the scope of the variable that has been coerced. Incidentally, if the variable is consumed by the coercion, this would pretty much match the "infer the maximal lifetime when coercing". EDIT: Hm, but you were worried about inferred lifetimes having effect on the validity of code, which I agree Bad (TM). It's just, the function boundary is pretty arbitrary and entirely blurred away by inlining. Plus, there may be mutable borrows that I cast to a raw pointer that genuinely do not live until the end of the function. "aliasing is now legal" would have to be communicated to whoever obtains the borrow when its lifetime ends. As in fn foobar(x: &mut i32) {
let zz : *mut _;
{ let y = x; // reborrow for the scope of y
let z = y as *mut _;
*z = 15;
zz = *z;
}
// How is the compiler supposed to know that `x` can now alias? Or is thus UB?
*x = 14;
*zz = 13;
*x
} (Update: fixed typo) |
Of course, only the mutable reference immediately being cast would stay "semi-active" - if it was a reborrow of some other thing, its parent would return to being active as soon as the reborrow will go out. This also preserves meaning in the presence of inlining. Though the SEME rules would make it hard to reborrow anything that is not just a parameter. Maybe just take a function being unsafe to fudge the activity of mutable pointers in it, and preserve that kind of fudging during inlining. I am not sure what is |
Sorry, I had a typo around the
In other words, the cast would completely consume its origin and be a valid pointer - with arbitrary aliasing - for the lifetime of the origin?
Hu? SEME?
Sorry for that^^. I think about this stuff in terms of entirely untyped code that doesn't even want to type-check (because we will eventually prove that it behaves semantically well-typed), and then translate it back to Rust, and sometimes I forget that in Rust I have to explicitly mark some pieces of this ;-) |
On Fri, Jan 15, 2016 at 01:30:15AM -0800, Ralf Jung wrote:
I am open to more restrictive definitions, but I definitely want to find something that is relatively easy for people to grok. I also want to avoid TBAA and other sensitive definitions. I think people should be able to cast freely around between types.
Seems plausible. The bottom line for me is that it should be some clear, syntactic boundary -- not the results of lifetime inference! |
Currently, region inference is can only pick out fairly coarse-grained regions. SEME is an RFC for more finely-grained regions which will make the destroying a mutable pointer more annoying.
That sounds like a good solution. It will be a little problematic when
Not really. The "dead areas" of mutable pointers are inferred once, before inlining, and then preserved during inlining. This is no more difficult than preserving type inference. |
Interestingly enough, if we run this code in MIRI: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0d8479bc8e9eeb656f08f3b763e80989
So, there's one answer at least. I'm not going to close this issue just yet, in case someone finds this discussion useful, but I imagine this thread is obsoleted by the unsafe code guidelines, or at least this question should be raised over there. |
The manual uses the expression "data reached through a shared reference" but does not define it. In the following code, is
x
reached through a shared reference?The text was updated successfully, but these errors were encountered: