-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provenance #52
Comments
In particular, are we OK with having code which has defined behaviour in Rust but is UB in C? For example, in C IIRC, fn foo() {
let x = false;
let y = true;
if (&x as usize + 1) == (&y as usize) {
let z: &[bool; 2] = unsafe { &*(&x as *const bool as *const [bool;2]); }
bar(z);
}
} If Rust doesn't track provenance, then it'll be safe in Rust (possibly unless |
Yes! C has type-based alias analysis, we do not. However, I think we will have provenance. I tried defining a model for Rust's references without provenance last year, and it wasn't very compatible with how unsafe code treats references. Moreover, I have since then come to believe that having a model where lifetimes are irrelevant for whether a program has UB is even more important, and I do not know how to define a model that both ignores lifetimes and has no provenance. That said, I strongly believe that pointers should be the only values that have provenance. In particular, integers do not have provenance. (And no, pointers are not just integers.) I think |
@RalfJung do raw pointers carry provenance, or just references? (So, is ref-to-raw-ptr-to-ref a no-op, or is raw-ptr-to-int-to-raw-ptr?) |
It is always difficult to fairly to evaluate how much optimization power a given UB actually enables and which concrete instances of optimizations could be salvage in specific circumstances by a different (often more involved) analysis, especially when you put no upper bound on the amount of engineering you're willing to invest into that. However, this particular UB is so fundamental to alias analysis of C-like languages that I have very little hope of ever encountering a real world optimizer that doesn't have this assumption (or an even stronger one, if it doesn't support C in the first place) hard-coded. Even when there's some escape hatch, optimizations will not be built to handle it as gracefully as possible, so even independent of LLVM, trying to allow cute little programs like this will effectively kneecap any optimizers. So there needs to be really, really, really good rationale for allowing this sort of shenanigans. By my current knowledge, even allowing reading of uninitalized variables has better justification (in that there's at least a genuinely-novel and potentially-useful data structure that relies on it). |
@asajeffrey That's a hard question. Raw pointers certainly carry some provenance about which allocation they point to, i.e., your example is UB even with raw pointers. Beyond that, Stacked Borrows currently does not assign them any provenance, but that is known to not match LLVM's noalias. The path forward here is not clear yet. |
Is there a way to use the stacked borrows model to make pointers just integers by keeping the provenance information in the memory rather than the pointer value, then on converting a let mut x = false;
let mut y = false;
if (&x as usize + 1) == (&y as usize) {
let z: *mut bool = (&mut x as *mut bool as usize + 1) as *mut bool;
let z: &mut bool = unsafe { &mut *z };
*z = true;
}
y // can be optimized to false since there's an implicit freezing of Er, lots of hand-waving there about what an "appropriate" borrow is. |
Stacked Borrows inherently makes it so that two pointers to the same location are not equivalent to use: let x = &mut 0;
let y = &mut *(x as *mut _);
*y = 4; // allowed
*x = 2; // allowed, but invalidates y
*y = 1; // NOT ALLOWED A model without provenance inherently cannot distinguish |
Hmm, so values of type |
A variant example: let mut p = (false, false);
let x = &mut p.0;
let z = (&x as *mut bool as usize + 1) as *mut bool;
let z = unsafe { &mut *z };
*z = true;
&p.1; // can be optimized to false??? I think this optimization is allowed, because I think this program is UB. |
I don't understand the distinction between provenance and "lifetime metadata". Also, people What happens, basically, is that we have two kinds of values: bitstrings, and pointers. Pointers carry provenance, meaning there are pointers that "point to the same thing" and yet are different. This is part of the abstract machine on which Rust runs and completely independent of any type system. We can then ask which effects various operations such as casting and loading from/storing to memory can have on each of these values. This is a long and complicated discussion that you can write entire papers about and still be far from done. In Stacked Borrows, I am trying to restrict provenance in raw pointers to a minimum. For example, casting ref-to-raw will "strip" the tag that Stacked Borrows uses to identify where a reference comes from. (However, it will still track which allocation the pointer points to. I see no possibility of allowing cross-allocation reasoning even for raw pointers. I also see very little motivation for that.) However, you can still transmute a reference to a raw pointer and then use that, and we have to say what happens (and people don't like it when we make that UB). |
I agree, and so does miri (select "Tools - Miri" to see it report this as UB). |
@RalfJung ah, we may be using the word "provenance" differently. I'm meaning the C defn. As far as I'm aware the best discussion of it is https://www.cl.cam.ac.uk/~km569/exploring_provenance.pdf When you say "pointers" you are meaning references and raw pointers? You are arguing that raw pointers need to carry metadata? |
I don't think we do.
I was talking about values in the abstract machines. At this point, there are no types. Just like the bitstring The odd thing to get used to here is that a bit in this machine cannot be just 0 or 1, but also uninitialized or "the n-th bit of some abstract pointer value". See this blog post for further details on that stuff (that post handles things byte-per-byte instead of bit-per-bit, but that's not a fundamental difference). We can then ask which values of the abstract machine can occur for variables of which types. For example, we would like So when I say "pointers" I mean all pointer values, no matter the type they occur at.
Yes. They need to carry the allocation they refer to, at the least (to rule out cross-allocation arithmetic). This is well-established even in Rust. |
Not 100% sure if this is the right issue, but wanted to toss this out here: CHERI reported compatibility issues related to provenance when porting C software to its platform. (See section 5.3) We have largely ignored CHERI in the design of the rust platform on the assumption that it's vaporware, but it seems that ARM has just announced that they are implementing native CHERI support. Note also that CHERI requires size_t and intptr_t be different (as discussed in the paper), as intptr_t is 128-bit and size_t is still 64-bit (iiuc). Also intptr_t needs to be codegen'd as if it is a pointer in order to maintain the hardware-level integrity of the tag bits it stores. |
I suspect that if ARM does add CHERI support, they will do it without changing the size of a pointer, instead using something similar to their memory tagging extension which uses the top 8 bits of a pointer that are ignored by the hardware. Another thing to look at is hwasan which also uses the top 8 bits of a pointer on AArch64 to implement a faster version of ASAN. There might be implications for code generation since the pointer tags will need to be preserved. |
FWIW, the glossary now contains a definition of provenance. |
That is not, and never can be CHERI. 8 bits is nowhere near enough to encode all of:
CHERI is always going to be using capabilities that are two times the native word size (there was an uncompressed 256-bit version originally, but that is clearly not practical). Fitting everything into 128 bits is already a significant technical accomplishment. Plus MTE potentially has other uses in a world with CHERI in order to accelerate temporal safety schemes. |
I found a C WG14 draft N2676 that discusses provenance (And N2624 "Introduction for discussion of N2577" (N2577 is N2676 without a small patch) ). I'm sharing it here since it seems relevant/interesting prior art. At one point, to support these desireable behaviours: Is the "copying pointer values with memcpy" usecase similar/equivalent to transmuting pointers/references? |
A couple of notes:
had better work and behave as if you had assigned rather than memcpy'd (see also the realloc example, which is basically the same thing, except that the memcpy is an implicit part of the realloc interface when it has to move your allocation). |
I have no idea where to properly note this, but I wanted to note it down: There's at least some desire for Personally, my opinion is just "yikes." By this time I'm firmly on the side that traditional ptrtoint needs to leak provenance such that act of god divining the address elsewhere in the process is valid. (But hopefully there's a way for the compiler to ptrtoint without leaking provenance, so you can e.g. display an address without considering the provenance leaked.) |
My opinion is "no way, if you do crazy things like that you get to keep the pieces".^^ It'd be much more sane to use something like union MyJavaScriptThing { float: f64, ptr: *mut () } and then use provenance-preserving operations to manipulate the bit pattern of |
You can implement Nanboxing using a This is one of the cases I'm thinking of when I've commented that ptr->int is often used to avoid duplicate code though. |
Closing, provenance exists and the questions here are answered |
Do we need to discuss C's provenance model?
The text was updated successfully, but these errors were encountered: