Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for strict_provenance #95228

Closed
1 of 13 tasks
Tracked by #20
Gankra opened this issue Mar 23, 2022 · 164 comments
Closed
1 of 13 tasks
Tracked by #20

Tracking Issue for strict_provenance #95228

Gankra opened this issue Mar 23, 2022 · 164 comments
Labels
A-strict-provenance Area: Strict provenance for raw pointers C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@Gankra
Copy link
Contributor

Gankra commented Mar 23, 2022

Feature gate: #![feature(strict_provenance)]

read the docs

get the stable polyfill

subtasks

This is a tracking issue for the strict_provenance feature. This is a standard library feature that governs the following APIs:

IMPORTANT: This is purely a set of library APIs to make your code more clear/reliable, so that we can better understand what Rust code is actually trying to do and what it actually needs help with. It is overwhelmingly framed as a memory model because we are doing a bit of Roleplay here. We are roleplaying that this is a real memory model and seeing what code doesn't conform to it already. Then we are seeing how trivial it is to make that code "conform".

This cannot and will not "break your code" because the lang and compiler teams are wholy uninvolved with this. Your code cannot be "run under strict provenance" because there isn't a compiler flag for "enabling" it. Although it would be nice to have a lint to make it easier to quickly migrate code that wants to play along.

This is an unofficial experiment to see How Bad it would be if Rust had extremely strict pointer provenance rules that require you to always dynamically preserve provenance information. Which is to say if you ever want to treat something as a Real Pointer that can be Offset and Dereferenced, there must be an unbroken chain of custody from that pointer to the original allocation you are trying to access using only pointer->pointer operations. If at any point you turn a pointer into an integer, that integer cannot be turned back into a pointer. This includes usize as ptr, transmute, type punning with raw pointer reads/writes, whatever. Just assume the memory "knows" it contains a pointer and that writing to it as a non-pointer makes it forget (because this is quite literally true on CHERI and miri, which are immediate beneficiaries of doing this).

A secondary goal of this project is to try to disambiguate the many meanings of ptr as usize, in the hopes that it might make it plausible/tolerable to allow usize to be redefined to be an address-sized integer instead of a pointer-sized integer. This would allow for Rust to more natively support platforms where sizeof(size_t) < sizeof(intptr_t), and effectively redefine usize from intptr_t to size_t/ptrdiff_t/ptraddr_t (it would still generally conflate those concepts, absent a motivation to do otherwise). To the best of my knowledge this would not have a practical effect on any currently supported platforms, and just allow for more platforms to be supported (certainly true for our tier 1 platforms).

A tertiary goal of this project is to more clearly answer the question "hey what's the deal with Rust on architectures that are pretty harvard-y like AVR and WASM (platforms which treat function pointers and data pointers non-uniformly)". There is... weirdness in the language because it's difficult to talk about "some" function pointer generically/opaquely and that encourages you to turn them into data pointers and then maybe that does Wrong Things.

The mission statement of this experiment is: assume it will and must work, try to make code conform to it, smash face-first into really nasty problems that need special consideration, and try to actually figure out how to handle those situations. We want the evil shit you do with pointers to work but the current situation leads to incredibly broken results, so something has to give.

Public API

This design is roughly based on the article Rust's Unsafe Pointer Types Need An Overhaul, which is itself based on the APIs that CHERI exposes for dynamically maintaining provenance information even under Fun Bit Tricks.

The core piece that makes this at all plausible is pointer::with_addr(self, usize) -> Self which dynamically re-establishes the provenance chain of custody. Everything else introduced is sugar or alternatives to as casts that better express intent.

More APIs may be introduced as we explore the feature space.

// core::ptr
pub fn invalid<T>(addr: usize) -> *const T;
pub fn invalid_mut<T>(addr: usize) -> *mut T;

// core::pointer
pub fn addr(self) -> usize;
pub fn with_addr(self, addr: usize) -> Self;
pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;

Steps / History

Unresolved Questions

  • How Bad Is This?

  • How Good Is This?

  • What's Problematic (And Should Work)?

    • Hardcoded MMIO address stuff
      • We should define a platform-specific way to do this, possibly requiring that you only use volatile access
    • Opaque Function Pointers - architectures like AVR and WASM treat function pointers special, they're normal pointers.
      • We should really define a #[repr(transparent)] OpaqueFnPtr(fn() -> ()) type in std, need a way to talk about e.g. dlopen.
    • libc interop for bad APIs that pun integers and pointers
      • Use a union to make the pun explicit?
    • passing shared pointers over IPC?
      • At worst you can rederive from your SHMEM?
    • downcasting to subclasses?
      • Would be nice if you could create a reference without shrinking its provenance to allow for ergonomic references to a baseclass that can be (unsafely) cast to a reference to a subclass.
    • memcpy operations conceptually say "all this memory is just u8's" which would trash provenance
      • it's pretty standard to carve out exceptions for memcpy, but it would be good to know if this can be done more rigorously
        with something like llvm's proposed byte type
    • AtomicPtr - AtomicPtr has a very limited API, so lots of people use AtomicUsize to do the equivalent of wrapping_add
      • Morally this is fine, unclear if the right compiler intrinsics exist to express this without "dropping" provenance.
  • What's Problematic (And Might Be Impossible)?

    • High-bit Tagging - rustc::ty does this because it makes common addressing modes Free Untagging Realestate
      • Technically this is "fine" but CHERI might get upset about it, needs investigation.
    • Pointer Compression - V8 and JVM like compressing pointers, involving massive truncations.
      • Can a Sufficiently Smart Union handle this?
    • Unrestricted XOR-list - XORing pointers to make an even more jacked up linked list
      • You must allocate all your nodes in a Vec/Arena to be able to reconstitute ptrs. At that point, use indices.
  • APIs We Want To Add/Change?

    • A lot of uses of .addr() are for alignment checks, .is_aligned(), .is_aligned_to(usize)?
    • An API to make ZST alloc forging explicit, exists_zst(usize)?
    • .addr() should arguably work on a DST, if you use .addr() you are ostensibly saying "I know this doesn't roundtrip"
    • Explicit conveniences for low-bit tagging? .with_tag(TAG)?
    • expose_addr/from_exposed_addr are slightly unfortunate names since it's not the address that gets exposed, it's the provenance. What would be better names? Please discuss on Zulip.
    • It is somewhat unfortunate that addr is the short and easy name for the operation that programmers likely expect less. (Many will expect expose_addr semantics.) Maybe it should have a different name. But which name?
@Gankra Gankra added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Mar 23, 2022
@Gankra
Copy link
Contributor Author

Gankra commented Mar 23, 2022

FAQ

Why Is Rust So Broken? This Clearly Isn't A Problem For C!

It is absolutely a problem for C. Rust, C, C++, Swift, etc. are all fundamentally built on the same tools and principles once you start dropping down to the level of memory models (i.e. Rust literally just punts on atomics by saying "it's the C11 model" because that's the model for atomics). Compiler backends currently do not consistently model pointers in the face of things like "but pointers are just integers right"?

Why Doesn't Rust Use C's Solution

Folks who work on C's semantics are still trying to solve this issue, with the leading solution being PNVI-ae-udi. It's an interesting and reasonable approach under the assumption of "we can't possibly get people to fix their code, and we can't completely change compiler backends". At a high-level, the proposal is to (in the abstract machine's semantics):

  • Maintain a global list of all "exposed" ("maybe aliased") allocations.
  • Allocations come into the world un-exposed (unaliased).
  • Whenever a pointer is cast to an integer (or otherwise escapes opaquely), mark its allocation as "exposed".
  • Whenever an integer is cast to a pointer do a global hittest with that address on all exposed allocations.
  • If it hits an exposed allocation, great, it gets that allocation's provenance.
  • If it hits two exposed allocations (due to "one-past-the-end" shenanigans) be sad and just tell the programmer to "be consistent" and only access one of the two.

The strength of this model is that from a compiler's perspective this can mostly just be understood as "business as usual" because you can still let programmers do ~whatever and reason locally along the lines of: "I have kept perfect track of these pointers, and therefore know exactly how they are/aren't aliased. These other pointers over here have escaped or come from something I don't know about, so I will assume they all alias eachother and compile their accesses very conservatively."

What they do need to change under this model is to admit that a ptr-to-int cast has a "side-effect" (exposing the allocation) and that optimizing it away is unsound, because then you will forget the pointer was exposed and do bad things.

The weakness of this model is that essentially implies that an int-to-ptr cast has permission to access whatever memory you want (all "exposed" memory). This makes it very hard for dynamic checkers to be useful, because anytime ptr-int-ptr things happen, even implicitly/transiently, the checker has to throw up its hands and say "I guess you know what you're doing" and won't actually be able to help you catch bugs. For instance if you use-after-free, the checker cannot notice if you get "lucky" and start accessing a new allocation in the same place if the pointer was cast from an integer after the reallocation. This is Sad.

It is perhaps an inevitable fate that Rust will adopt C's model, or at least have to interoperate with it, but it would be nice if we could do better given how much time and energy we put into having more rigorous and safe concepts for borrows and lifetimes! Rust is uniquely positioned to explore stricter semantics, and has established precedent for just saying "hey actually this idea we've had since time immemorial is busted, let's migrate everyone off of it so the language can make sense".

I am not saying we are going to break the world right now, but we should explore how bad breaking the world is, and at worst, making code conform to strict provenance in more places will make our existing tools work more reliably, which is just Good.

But CHERI Runs C Code Fine?

Yes! Sort of.

Pointer-integer shenanigans are not actually that common, and so most code actually already trivially has dynamic/strict provenance in both Rust and C. Most of the time people are just comparing addresses, checking alignment, or doing tagged-pointer shenanigans. These things are totally fine under strict provenance!

The remaining places where people are actually committing pointer-integer crimes are handled by a hack that mostly works: defining intptr_t to just actually be a pointer still and having the compiler handle/codegen it as such. This is one of the genuine successes of C's model of "define a million different integer types that sure sound like they should be the same thing but get to be different if an implementation says they are". In particular, making this tolerable requires CHERI to say intptr_t is 128-bit and size_t/ptrdiff_t and friends are 64-bit. That said this hack has its limitations and Sufficiently Evil code will still break and just needs to be reworked. Or the code needs to be compiled to a significantly less strict model that looks A Lot like PNVI-ae-udi and removes most of the value of the checker.

The intptr_t hack was explored for Rust but it doesn't work very well, because rust doesn't make a distinction between size_t and intptr_t. This meant every array size and index was 128-bit and handled by CHERI's more expensive pointer registers/instructions on the paranoid assumption that it could all be Secret Pointers We Need To Track. For Rust to properly support CHERI it needs to decouple the notions of size_t and intptr_t, which means we need everyone to be more clear on what they mean when they convert between pointers and integers.

It would also be nice if Rust, the systems language that Really Cares About Memory-Safety was a first-class citizen on CHERI, the architecture that Really Cares About Memory-Safety. We are pursuing the same goals, and Rust's design is seemingly very friendly to CHERI... except pointer<->integer shenanigans make everything into a muddy mess.

Isn't It A Big Deal That CHERI "Breaks" wrapping_offset

CHERI's pointer compression scheme falls over if you wrapping_offset too far out-of-bounds (like, kilobytes out of bounds), and will silently mark your pointer as "corrupt" (while still faithfully preserving the actual address). Once this happens, offsetting that pointer back in bounds and trying to load/store will fault.

It's annoying but it's not really a problem. It's a system limitation, and if you run afoul of it you will get a deterministic fault. This is much the same as targetting Rust to some little embedded system, having to disable all of libstd, and then still crashing because you were too sloppy with your memory footprint. Or how random parts of std have to be cfg'd out when targetting WASM. Some code just isn't as portable as you'd like, because anything more exotic than x64 has quirky little limitations. Rust is not the intersection of all platform limitations, because that intersection is terrible.

Also I should clarify something because it seems to have been lost to history: wrapping_offset was never the "good" offset. At least in my time as a standard library team member, it was always intended that all Rust code should always be attempting to use offset. This is what the rustonomicon advocates, and how libstd was written. offset is the semantics the language uses for things like borrowing fields. If you access the contents of a slice or collection or anything else in std, that will overwhelmingly be done with offset, because that's The Right Way To Do It.

wrapping_offset is for "I am doing something really bad and can't do things Right". It's useful! I have many times wanted to use it to avoid dealing with some weird case in thin-vec or whatever other horrible unsafe code I'm writing. I generally don't because years of working on std burned Offset Is Right, Do It Right into my brain. It's ok for you to want a bit of "slop" to simplify some nasty unsafe code, but in this case that slop comes with the possibility of the code crashing on a relatively exotic platform.

Yes I know offset says:

Consider using wrapping_offset instead if these constraints are difficult to satisfy. The only advantage of this method is that it enables more aggressive compiler optimizations.

I wrote those docs, so this is my fault, mea culpa. The fact that you "wanted" to use offset was completely burned into my brain, so I didn't even think about mentioning/clarifying that. Like whenever I look at this line my brain is implicitly putting "☠️ IF YOUR CODE IS TERRIBLE, AND YOU ABSOLUTELY MUST ☠️" in front of it, because that was the conventional understanding of these two methods when this was written.

Your code isn't terrible for using wrapping_offset, I just should have made it more clear that it should be regarded as a Last Resort and not "the chill offset for everyone". 😿

Isn't This Model WAY Too Strict?

Probably! My goal is not to define The One True Memory Model for Rust, but to define a painfully simple one that is easy for Rust programmers to understand and make their code conform to by default. The "idea" with strict-provenance is that it's so strict that any coherent model must surely be a weakening of it (i.e. says strictly more code is allowed). In this way, code that happens to conform to strict-provenance (which is most code, as CHERI has demonstrated!) is essentially guaranteed to be correct under any model and to not be miscompiled (barring compiler bugs).

One can imagine ending up with this "tower of weakenings":

  • strictest: (stacked-borrows-with-)strict-provenance, a model Rust programmers try their best to conform to.
  • strict-ish: "real" stacked-borrows, an actually functional memory-model for the crimes Rust Programmers Crave.
  • shrug-emoji: the actual primitives that compiler backends emit, and the optimizations they perform with them.

If your code works higher up the tower, it will definitely work against anything lower down the tower, and the bottom of the tower is the one that "matters", because that's the thing that actually compiles your code.

It is frustrating as a programmer to know that there is this vague memory-model stuff going on, and that compilers are vaguely broken because they don't really have coherent models. By making it easier for code to conform to strict-provenance, we are making it more robust in the face of inconsistent and buggy semantics AND future-proofing that code against any possible "real" model.

Mega Link Pile

The Work On This API:

Prior Art For This API:

CHERI Resources:

Provenance Resources:

Strict Provenance Zulip Threads:

@Gankra
Copy link
Contributor Author

Gankra commented Mar 23, 2022

The proposed lints in #95199 should be something a user messing with these APIs can opt into to quickly find sketchy places in their code. What's the "right" way to expose an unstable lint? Is it sufficient to make it allow and users can opt in with normal linting stuff, or do we also need a special feature/-Z to opt into the lint existing at all?

@workingjubilee
Copy link
Member

workingjubilee commented Mar 24, 2022

Hardcoded MMIO address stuff

We should define a platform-specific way to do this, possibly requiring that you only use volatile access

This should probably be more like core::arch::asm!: it's under arch in module terms but it actually is platform-generic, because it has the same factors in play: it's "architecturally specific in terms of invocation but almost universal because it appears almost everywhere".

The proposed lints in #95199 should be something a user messing with these APIs can opt into to quickly find sketchy places in their code. What's the "right" way to expose an unstable lint? Is it sufficient to make it allow and users can opt in with normal linting stuff, or do we also need a special feature/-Z to opt into the lint existing at all?

I... hm. I think the allow-by-default lint is conservative enough? It shouldn't need a -Z feature unless we get something wild going on like turning miri on to find misbehaving pointers.

@Gankra
Copy link
Contributor Author

Gankra commented Mar 24, 2022

At least in the bootstrap, the compiler will complain if you allow() a lint in your code that doesn't exist. This potentially just means:

  • We need to keep the experimental lint around forever even when the experiment is over
  • Users can only "safely" invoke it from the command line manually, which is slightly unfortunate for anything like what I did where I used it as a FIXME/WONTFIX marker for the file.

Also due to the "Opaque Function Pointers" / "Harvard Architecture" / "AVR is cursed" issue

// HACK: The intermediate cast as usize is required for AVR

I think we want the lint broken up into parts:

  • #[fuzzy_provenance_casts] - int-to-ptr, totally evil
  • #[lossy_provencance_casts] - ptr-to-int, sketchy but valid as long as you actually want .addr() semantics
  • #[oxford_casts] - casts that make harvard architectures sad -- fn<->ptr (name is a joke... unless...)

I can't justify discouraging fn <-> int, absent better ways to talk about fn ptrs properly.

@workingjubilee
Copy link
Member

At least in the bootstrap, the compiler will complain if you allow() a lint in your code that doesn't exist. This potentially just means:

Hm... I think we can make a lint conditional on #![feature(strict_provenance)] being enabled? I remember seeing at least one lint that is like that.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 26, 2022

Overall I like this idea for Rust, but it seems incompatible for C interop. It's pretty common in C APIs to use an integer as a "user-data" field intended to store arbitrary values (primarily pointers). For example, the winapi SetWindowLongPtr function accepts a pointer-sized integer.

I could imagine having some kind of built-in "pointer map" type, which behaves like a HashMap<usize. *mut T>: this would allow storing and then later "recovering" the provenance of a pointer based on its address. On CHERI it could be an actual map, but on other architectures it could just be a no-op (at least at the machine level, not in the abstract machine).

@Gankra
Copy link
Contributor Author

Gankra commented Mar 26, 2022

I feel like it's very plausible to define some sort of pointer-int union without messing with ABI for "this is a pointer, the API is lying" since in general, afaict, it's always sound to say something that is actually just an integer is a pointer "for fun" (as ptr::invalid shows) as long as you only deref/offset it when it's a Real Pointer.

@RalfJung
Copy link
Member

it's always sound to say something that is actually just an integer is a pointer "for fun"

Yes indeed, pointers are a superset of (equally-sized) integers.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 27, 2022

I feel like it's very plausible to define some sort of pointer-int union without messing with ABI for "this is a pointer, the API is lying"

But then you can't mechanically map C signatures to Rust, because you can't know whether an integer should be treated as a pointer or not.

Today, basically everything you can do in C, you can also do in unsafe Rust, which makes it possible to call APIs designed to be called from C, but if Rust can't just "do what C does", then it makes that interop a lot harder.

@Gankra
Copy link
Contributor Author

Gankra commented Mar 27, 2022

I mean, yes, but if you actually use the API and it expects you to pass it a dereferenceable pointer where it says the arg is an integer, then at that point you can go "ah, this API is lying" and do whatever needs to be done about that. Blindly charging forward, slamming into problems, and forcing ourselves to figure out what "do whatever needs to be done" should look like is the primary mission statement of this experiment.

@Lokathor
Copy link
Contributor

Lokathor commented Mar 27, 2022

So, for that specific API, SetWindowLongPtrW, the official signature looks like this:

LONG_PTR SetWindowLongPtrW(
  [in] HWND     hWnd,
  [in] int      nIndex,
  [in] LONG_PTR dwNewLong
);

where HWND is a *mut c_void (or other opaque pointer of choice), and LONG_PTR is isize (more specifically documented as "an integer the size of a pointer").

So in "real programs" you're expected to:

  • during window creation: allocate your user data on the heap, then set your pointer to your HWND
  • during painting or whatever: you can get your userdata pointer (GetWindowLongPtrW) and change some stuff in that allocation. This is largely necessary because the window event handler system is callback based, and the callback fn doesn't otherwise have access to any of your "main program" data.
  • during window destruction: you get the pointer and then free that allocation

Also, while it's sometimes UB to declare the wrong foreign signature, that's because of cross-lang LTO, and we never normally compile user32, so we can just declare the wrong signature and type the function as:

extern "system" {
  pub fn SetWindowLongPtrW(hwnd: *mut c_void, index: c_int, new_long: *mut c_void) -> *mut c_void
}

And now we "don't have to" perform pointer to int casting ourselves, it just silently happens during the foreign interfacing.

All that said, when we get the pointer back from GetWindowLongPtrW, we've still lost out provenance info. So we're still hosed. We still need to be able to send a pointer to foreign-land, get it back from foreign-land later, and have the pointer continue to be usable by rust.

@Gankra
Copy link
Contributor Author

Gankra commented Mar 27, 2022

Could you elaborate on why provenance "must" be lost?

If you're operating on a system that actually dynamically maintains provenance, the information must be maintained by the callee anyway or the OS literally doesn't function.

If you're operating under a model where the rust abstract machine just needs to be self-consistent, then presumably having the API signature reflect "pointer goes in, pointer goes out" is sufficient? Like yes the compiler doesn't "know" where those pointers go or come from, but the compiler also doesn't "do" global analysis and therefore must be able to cope with calling a native rust function with a (*mut) -> *mut signature, right?

@Lokathor
Copy link
Contributor

Well if there's no global analysis then that's fine, sure.

@RalfJung
Copy link
Member

RalfJung commented Mar 27, 2022

Agreed with @Gankra.

  • Either those FFI calls go to 'outside' the Abstract Machine, in which case their effect on the state of the Abstract Machine basically has to be axiomatized (similar to how Miri implements 'shims' for calling system functions). We can just axiomatize that the provenance of the user data pointer is preserved. The compiler doesn't know what the right axiom is for this function so it has to be correct no matter what.

    (This assume GetWindowLongPtrW returns a type that can carry provenance, like *mut c_void.)

  • Or the other side of the call is conceptually "inside" the Abstract Machine (think: cross-lang inlining); then that code has to be written to preserve provenance.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 27, 2022

(This assume GetWindowLongPtrW returns a type that can carry provenance, like *mut c_void.)

It does not, it returns an integer, but presumably you are suggesting redefining the function such that the return type preserves provenance.

@RalfJung
Copy link
Member

Yes, @Lokathor suggested above to adjust the signature of SetWindowLongPtrW so I assume the same is done to the other related functions.

@arichardson
Copy link

So, for that specific API, SetWindowLongPtrW, the official signature looks like this:

LONG_PTR SetWindowLongPtrW(
  [in] HWND     hWnd,
  [in] int      nIndex,
  [in] LONG_PTR dwNewLong
);
extern "system" {
  pub fn SetWindowLongPtrW(hwnd: *mut c_void, index: c_int, new_long: *mut c_void) -> *mut c_void
}

Just as a note, this is in fact what the LLVM IR function signature looks like when building C/C++ with CHERI LLVM (https://cheri-compiler-explorer.cl.cam.ac.uk/z/KG5oqq).

intptr_t is lowered to i8 addrspace(200)*, so this retains provenance information and the signature would also be correct for cross-language LTO in the CHERI case.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 28, 2022

@arichardson presumably that's because intptr_t is typedef'd to some special intrinsic type that is lowered to a pointer only for CHERI? That's an additional incompatibility then, because Rust's isize is how we map intptr_t, but isize does not store provenance.

@Gankra
Copy link
Contributor Author

Gankra commented Mar 28, 2022

@Diggsey yes, see "A secondary goal" in the top comment and "But CHERI Runs C Code Fine?" in the FAQ.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 28, 2022

I think my concerns with C interop basically boil down to this: Rust might need a way to interoperate with the PNVI-ae-udi model (assuming that is what C goes with) and I'm not convinced that the boundary between Rust's provenance model and the C provenance model can lie precisely on the FFI boundary. I think there will always be cases where we need unsafe Rust to be able to "act like C" for that interop to be practical.

If that concern turns out to be well-founded, it doesn't mean we can't still do better than C though. For example, we could use the same model, but have the "act of exposing a pointer" be an explicit compiler intrinsic intended for unsafe FFI code, rather than happening on all pointer-to-int casts.

@Gankra
Copy link
Contributor Author

Gankra commented Mar 28, 2022

Yes it's probable there will need to be a way to say "I give up" and use a Ptr16/Ptr32/Ptr64/Ptr128 "integer type" that is exactly like how CHERI handles intptr_t. Solutions like this will be considered more seriously as we attempt to "migrate" more code to stricter semantics and run into limitations with the MVP.

In terms of things like interoperating with PNVI-ae-udi at the level of cross-language LTO -- it is wholy premature to at all think about that. Like, 5-10 years premature, realistically. Having a proposed experimental memory model is in a completely different galaxy from actually emitting aliasing metadata into LLVM.

@RalfJung
Copy link
Member

RalfJung commented Mar 28, 2022

If we don't consider cross-language LTO, then the FFI boundary is where such interop happens and I don't think we need to worry much about PNVI. Interactions all happen on the 'target machine' level, where most of the time there is no provenance (and when there is, like on CHERI, it is very explicit and propagates through integer and pointer types equally).

If we do consider cross-language LTO, then really the interactions are described by a "shared abstract machine" that the optimizations happen on -- probably the LLVM IR Abstract Machine. It is anyone's guess how that one will account for PNVI, but given that the trade-offs are very different between surface languages and IRs (and given that LLVM IR does not enforce TBAA on all accesses) I doubt they will just copy PNVI. So it is rather futile IMO to try and prepare for this future. (And @Gankra wrote basically the same thing at the same time.^^)

@tmandry tmandry moved this to Idea in Lang Edition 2024 Nov 3, 2023
xv-ian-c added a commit to expressvpn/wolfssl-rs that referenced this issue Nov 9, 2023
We have to jump through a few hoops but it is possible for the callback to work
only in terms of raw pointers and references without ever instantiating an
actual `Box` or `Arc` and therefore having to worry about drop.

In the absence of [strict provenance][] (nightly only unstable feature) we do
still need the `Box` in order to make a thin pointer to the fat `dyn
Tls13SecretCallbacks` which we need to access.

(Aside: I think making `Session` generic over a `CB: Tls13SecretCallbacks`
would avoid the box, as we do with `IOCB`, however for the keylogger debug
facility we don't need to be quite so efficient and the generics get
everywhere)

However the `Box` should be part of `Self` so that the required lifetime is
established. From that we can obtain a (thin) reference to the allocation
within the `Box` which contains the `Arc<dyn...>` which contains a fat
reference to the actual callback object. That thin reference from the `Box` can
then be turned into a raw pointer (which a fat pointer cannot without strict
provenance).

The `&mut **self.secret_cb.as_mut().unwrap()` construct is tricky:

- `self.secret_cb.as_mut().unwrap()` has type `&Box<Arc<dyn...>>`;
- therefore `*...` is the `Box<Arc<dyn...>>` itself;
- therefore `**...` is the allocation within the box i.e. `Arc<dyn...>>`;
- finally `&**...` is a reference to that allocation i.e. the `&Arc<dyn...>`
  which we need.

[strict provenance]: rust-lang/rust#95228
xv-ian-c added a commit to expressvpn/wolfssl-rs that referenced this issue Nov 9, 2023
We have to jump through a few hoops but it is possible for the callback to work
only in terms of raw pointers and references without ever instantiating an
actual `Box` or `Arc` and therefore having to worry about drop.

In the absence of [strict provenance][] (nightly only unstable feature) we do
still need the `Box` in order to make a thin pointer to the fat `dyn
Tls13SecretCallbacks` which we need to access.

(Aside: I think making `Session` generic over a `CB: Tls13SecretCallbacks`
would avoid the box, as we do with `IOCB`, however for the keylogger debug
facility we don't need to be quite so efficient and the generics get
everywhere)

However the `Box` should be part of `Self` so that the required lifetime is
established. From that we can obtain a (thin) reference to the allocation
within the `Box` which contains the `Arc<dyn...>` which contains a fat
reference to the actual callback object. That thin reference from the `Box` can
then be turned into a raw pointer (which a fat pointer cannot without strict
provenance).

[strict provenance]: rust-lang/rust#95228
facebook-github-bot pushed a commit to facebook/sapling that referenced this issue Feb 20, 2024
Summary:
`1.76.0` release with fixes addressing the following:
* Release notes ([link](https://releases.rs/docs/1.76.0/))
  * Most notable is [#118054](rust-lang/rust#118054) manifesting as:
```
error: unused implementer of `futures::Future` that must be used
   --> fbcode/mlx/metalearner/housekeeper/housekeeper.rs:213:13
    |
213 |             self.ping_oncall(&oncall, usecases);
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: futures do nothing unless you `.await` or poll them
    = note: requested on the command line with `-D unused-must-use`
```
* Changes in `search_index.js` spec for `rustdoc` ([link](https://github.com/rust-lang/rust/pull/118910/files#diff-3ac57789ddcd2856a3b4f0c444f2813315179bdbe55bb945fe64fcb27b53fee5L491))
* Split of `#![feature(exposed_provenance)]` ([link](rust-lang/rust#118487)) from [#95228](rust-lang/rust#95228)
* `buck2` OSS toolchain bump to `nightly-2023-12-11` just before [#11878](rust-lang/rust-clippy#11878) and a bunch of other clippy lint renames.

Reviewed By: dtolnay

Differential Revision: D53776867

fbshipit-source-id: 78db83d8cdd6b0abae2b94ed1075e67b501fcd73
facebook-github-bot pushed a commit to facebook/buck2 that referenced this issue Feb 20, 2024
Summary:
`1.76.0` release with fixes addressing the following:
* Release notes ([link](https://releases.rs/docs/1.76.0/))
  * Most notable is [#118054](rust-lang/rust#118054) manifesting as:
```
error: unused implementer of `futures::Future` that must be used
   --> fbcode/mlx/metalearner/housekeeper/housekeeper.rs:213:13
    |
213 |             self.ping_oncall(&oncall, usecases);
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: futures do nothing unless you `.await` or poll them
    = note: requested on the command line with `-D unused-must-use`
```
* Changes in `search_index.js` spec for `rustdoc` ([link](https://github.com/rust-lang/rust/pull/118910/files#diff-3ac57789ddcd2856a3b4f0c444f2813315179bdbe55bb945fe64fcb27b53fee5L491))
* Split of `#![feature(exposed_provenance)]` ([link](rust-lang/rust#118487)) from [#95228](rust-lang/rust#95228)
* `buck2` OSS toolchain bump to `nightly-2023-12-11` just before [#11878](rust-lang/rust-clippy#11878) and a bunch of other clippy lint renames.

Reviewed By: dtolnay

Differential Revision: D53776867

fbshipit-source-id: 78db83d8cdd6b0abae2b94ed1075e67b501fcd73
facebook-github-bot pushed a commit to facebookincubator/reindeer that referenced this issue Feb 20, 2024
Summary:
`1.76.0` release with fixes addressing the following:
* Release notes ([link](https://releases.rs/docs/1.76.0/))
  * Most notable is [#118054](rust-lang/rust#118054) manifesting as:
```
error: unused implementer of `futures::Future` that must be used
   --> fbcode/mlx/metalearner/housekeeper/housekeeper.rs:213:13
    |
213 |             self.ping_oncall(&oncall, usecases);
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: futures do nothing unless you `.await` or poll them
    = note: requested on the command line with `-D unused-must-use`
```
* Changes in `search_index.js` spec for `rustdoc` ([link](https://github.com/rust-lang/rust/pull/118910/files#diff-3ac57789ddcd2856a3b4f0c444f2813315179bdbe55bb945fe64fcb27b53fee5L491))
* Split of `#![feature(exposed_provenance)]` ([link](rust-lang/rust#118487)) from [#95228](rust-lang/rust#95228)
* `buck2` OSS toolchain bump to `nightly-2023-12-11` just before [#11878](rust-lang/rust-clippy#11878) and a bunch of other clippy lint renames.

Reviewed By: dtolnay

Differential Revision: D53776867

fbshipit-source-id: 78db83d8cdd6b0abae2b94ed1075e67b501fcd73
facebook-github-bot pushed a commit to facebookexperimental/reverie that referenced this issue Feb 20, 2024
Summary:
`1.76.0` release with fixes addressing the following:
* Release notes ([link](https://releases.rs/docs/1.76.0/))
  * Most notable is [#118054](rust-lang/rust#118054) manifesting as:
```
error: unused implementer of `futures::Future` that must be used
   --> fbcode/mlx/metalearner/housekeeper/housekeeper.rs:213:13
    |
213 |             self.ping_oncall(&oncall, usecases);
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: futures do nothing unless you `.await` or poll them
    = note: requested on the command line with `-D unused-must-use`
```
* Changes in `search_index.js` spec for `rustdoc` ([link](https://github.com/rust-lang/rust/pull/118910/files#diff-3ac57789ddcd2856a3b4f0c444f2813315179bdbe55bb945fe64fcb27b53fee5L491))
* Split of `#![feature(exposed_provenance)]` ([link](rust-lang/rust#118487)) from [#95228](rust-lang/rust#95228)
* `buck2` OSS toolchain bump to `nightly-2023-12-11` just before [#11878](rust-lang/rust-clippy#11878) and a bunch of other clippy lint renames.

Reviewed By: dtolnay

Differential Revision: D53776867

fbshipit-source-id: 78db83d8cdd6b0abae2b94ed1075e67b501fcd73
facebook-github-bot pushed a commit to facebook/starlark-rust that referenced this issue Feb 20, 2024
Summary:
`1.76.0` release with fixes addressing the following:
* Release notes ([link](https://releases.rs/docs/1.76.0/))
  * Most notable is [#118054](rust-lang/rust#118054) manifesting as:
```
error: unused implementer of `futures::Future` that must be used
   --> fbcode/mlx/metalearner/housekeeper/housekeeper.rs:213:13
    |
213 |             self.ping_oncall(&oncall, usecases);
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: futures do nothing unless you `.await` or poll them
    = note: requested on the command line with `-D unused-must-use`
```
* Changes in `search_index.js` spec for `rustdoc` ([link](https://github.com/rust-lang/rust/pull/118910/files#diff-3ac57789ddcd2856a3b4f0c444f2813315179bdbe55bb945fe64fcb27b53fee5L491))
* Split of `#![feature(exposed_provenance)]` ([link](rust-lang/rust#118487)) from [#95228](rust-lang/rust#95228)
* `buck2` OSS toolchain bump to `nightly-2023-12-11` just before [#11878](rust-lang/rust-clippy#11878) and a bunch of other clippy lint renames.

Reviewed By: dtolnay

Differential Revision: D53776867

fbshipit-source-id: 78db83d8cdd6b0abae2b94ed1075e67b501fcd73
facebook-github-bot pushed a commit to facebookexperimental/rust-shed that referenced this issue Feb 20, 2024
Summary:
`1.76.0` release with fixes addressing the following:
* Release notes ([link](https://releases.rs/docs/1.76.0/))
  * Most notable is [#118054](rust-lang/rust#118054) manifesting as:
```
error: unused implementer of `futures::Future` that must be used
   --> fbcode/mlx/metalearner/housekeeper/housekeeper.rs:213:13
    |
213 |             self.ping_oncall(&oncall, usecases);
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: futures do nothing unless you `.await` or poll them
    = note: requested on the command line with `-D unused-must-use`
```
* Changes in `search_index.js` spec for `rustdoc` ([link](https://github.com/rust-lang/rust/pull/118910/files#diff-3ac57789ddcd2856a3b4f0c444f2813315179bdbe55bb945fe64fcb27b53fee5L491))
* Split of `#![feature(exposed_provenance)]` ([link](rust-lang/rust#118487)) from [#95228](rust-lang/rust#95228)
* `buck2` OSS toolchain bump to `nightly-2023-12-11` just before [#11878](rust-lang/rust-clippy#11878) and a bunch of other clippy lint renames.

Reviewed By: dtolnay

Differential Revision: D53776867

fbshipit-source-id: 78db83d8cdd6b0abae2b94ed1075e67b501fcd73
facebook-github-bot pushed a commit to facebook/errpy that referenced this issue Feb 20, 2024
Summary:
`1.76.0` release with fixes addressing the following:
* Release notes ([link](https://releases.rs/docs/1.76.0/))
  * Most notable is [#118054](rust-lang/rust#118054) manifesting as:
```
error: unused implementer of `futures::Future` that must be used
   --> fbcode/mlx/metalearner/housekeeper/housekeeper.rs:213:13
    |
213 |             self.ping_oncall(&oncall, usecases);
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: futures do nothing unless you `.await` or poll them
    = note: requested on the command line with `-D unused-must-use`
```
* Changes in `search_index.js` spec for `rustdoc` ([link](https://github.com/rust-lang/rust/pull/118910/files#diff-3ac57789ddcd2856a3b4f0c444f2813315179bdbe55bb945fe64fcb27b53fee5L491))
* Split of `#![feature(exposed_provenance)]` ([link](rust-lang/rust#118487)) from [#95228](rust-lang/rust#95228)
* `buck2` OSS toolchain bump to `nightly-2023-12-11` just before [#11878](rust-lang/rust-clippy#11878) and a bunch of other clippy lint renames.

Reviewed By: dtolnay

Differential Revision: D53776867

fbshipit-source-id: 78db83d8cdd6b0abae2b94ed1075e67b501fcd73
@Rua
Copy link
Contributor

Rua commented May 18, 2024

There is a ptr::without_provenance constructor, but no NonNull::without_provenance (and _mut). Is that an omission?

Also, in general it seems that the number of APIs being covered by this feature has grown by quite a bit since the original post, and it's now rather out of date. ptr::without_provenance for example is missing.

@RalfJung
Copy link
Member

I made a PR to stabilize this. :)

#130350

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 21, 2024
stabilize Strict Provenance and Exposed Provenance APIs

Given that [RFC 3559](https://rust-lang.github.io/rfcs/3559-rust-has-provenance.html) has been accepted, t-lang has approved the concept of provenance to exist in the language. So I think it's time that we stabilize the strict provenance and exposed provenance APIs, and discuss provenance explicitly in the docs:
```rust
// core::ptr
pub const fn without_provenance<T>(addr: usize) -> *const T;
pub const fn dangling<T>() -> *const T;
pub const fn without_provenance_mut<T>(addr: usize) -> *mut T;
pub const fn dangling_mut<T>() -> *mut T;
pub fn with_exposed_provenance<T>(addr: usize) -> *const T;
pub fn with_exposed_provenance_mut<T>(addr: usize) -> *mut T;

impl<T: ?Sized> *const T {
    pub fn addr(self) -> usize;
    pub fn expose_provenance(self) -> usize;
    pub fn with_addr(self, addr: usize) -> Self;
    pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;
}

impl<T: ?Sized> *mut T {
    pub fn addr(self) -> usize;
    pub fn expose_provenance(self) -> usize;
    pub fn with_addr(self, addr: usize) -> Self;
    pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;
}

impl<T: ?Sized> NonNull<T> {
    pub fn addr(self) -> NonZero<usize>;
    pub fn with_addr(self, addr: NonZero<usize>) -> Self;
    pub fn map_addr(self, f: impl FnOnce(NonZero<usize>) -> NonZero<usize>) -> Self;
}
```

I also did a pass over the docs to adjust them, because this is no longer an "experiment". The `ptr` docs now discuss the concept of provenance in general, and then they go into the two families of APIs for dealing with provenance: Strict Provenance and Exposed Provenance. I removed the discussion of how pointers also have an associated "address space" -- that is not actually tracked in the pointer value, it is tracked in the type, so IMO it just distracts from the core point of provenance. I also adjusted the docs for `with_exposed_provenance` to make it clear that we cannot guarantee much about this function, it's all best-effort.

There are two unstable lints associated with the strict_provenance feature gate; I moved them to a new [strict_provenance_lints](rust-lang#130351) feature since I didn't want this PR to have an even bigger FCP. ;)

`@rust-lang/opsem` Would be great to get some feedback on the docs here. :)
Nominating for `@rust-lang/libs-api.`

Part of rust-lang#95228.

[FCP comment](rust-lang#130350 (comment))
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 21, 2024
Rollup merge of rust-lang#130350 - RalfJung:strict-provenance, r=dtolnay

stabilize Strict Provenance and Exposed Provenance APIs

Given that [RFC 3559](https://rust-lang.github.io/rfcs/3559-rust-has-provenance.html) has been accepted, t-lang has approved the concept of provenance to exist in the language. So I think it's time that we stabilize the strict provenance and exposed provenance APIs, and discuss provenance explicitly in the docs:
```rust
// core::ptr
pub const fn without_provenance<T>(addr: usize) -> *const T;
pub const fn dangling<T>() -> *const T;
pub const fn without_provenance_mut<T>(addr: usize) -> *mut T;
pub const fn dangling_mut<T>() -> *mut T;
pub fn with_exposed_provenance<T>(addr: usize) -> *const T;
pub fn with_exposed_provenance_mut<T>(addr: usize) -> *mut T;

impl<T: ?Sized> *const T {
    pub fn addr(self) -> usize;
    pub fn expose_provenance(self) -> usize;
    pub fn with_addr(self, addr: usize) -> Self;
    pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;
}

impl<T: ?Sized> *mut T {
    pub fn addr(self) -> usize;
    pub fn expose_provenance(self) -> usize;
    pub fn with_addr(self, addr: usize) -> Self;
    pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;
}

impl<T: ?Sized> NonNull<T> {
    pub fn addr(self) -> NonZero<usize>;
    pub fn with_addr(self, addr: NonZero<usize>) -> Self;
    pub fn map_addr(self, f: impl FnOnce(NonZero<usize>) -> NonZero<usize>) -> Self;
}
```

I also did a pass over the docs to adjust them, because this is no longer an "experiment". The `ptr` docs now discuss the concept of provenance in general, and then they go into the two families of APIs for dealing with provenance: Strict Provenance and Exposed Provenance. I removed the discussion of how pointers also have an associated "address space" -- that is not actually tracked in the pointer value, it is tracked in the type, so IMO it just distracts from the core point of provenance. I also adjusted the docs for `with_exposed_provenance` to make it clear that we cannot guarantee much about this function, it's all best-effort.

There are two unstable lints associated with the strict_provenance feature gate; I moved them to a new [strict_provenance_lints](rust-lang#130351) feature since I didn't want this PR to have an even bigger FCP. ;)

`@rust-lang/opsem` Would be great to get some feedback on the docs here. :)
Nominating for `@rust-lang/libs-api.`

Part of rust-lang#95228.

[FCP comment](rust-lang#130350 (comment))
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Oct 22, 2024
stabilize Strict Provenance and Exposed Provenance APIs

Given that [RFC 3559](https://rust-lang.github.io/rfcs/3559-rust-has-provenance.html) has been accepted, t-lang has approved the concept of provenance to exist in the language. So I think it's time that we stabilize the strict provenance and exposed provenance APIs, and discuss provenance explicitly in the docs:
```rust
// core::ptr
pub const fn without_provenance<T>(addr: usize) -> *const T;
pub const fn dangling<T>() -> *const T;
pub const fn without_provenance_mut<T>(addr: usize) -> *mut T;
pub const fn dangling_mut<T>() -> *mut T;
pub fn with_exposed_provenance<T>(addr: usize) -> *const T;
pub fn with_exposed_provenance_mut<T>(addr: usize) -> *mut T;

impl<T: ?Sized> *const T {
    pub fn addr(self) -> usize;
    pub fn expose_provenance(self) -> usize;
    pub fn with_addr(self, addr: usize) -> Self;
    pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;
}

impl<T: ?Sized> *mut T {
    pub fn addr(self) -> usize;
    pub fn expose_provenance(self) -> usize;
    pub fn with_addr(self, addr: usize) -> Self;
    pub fn map_addr(self, f: impl FnOnce(usize) -> usize) -> Self;
}

impl<T: ?Sized> NonNull<T> {
    pub fn addr(self) -> NonZero<usize>;
    pub fn with_addr(self, addr: NonZero<usize>) -> Self;
    pub fn map_addr(self, f: impl FnOnce(NonZero<usize>) -> NonZero<usize>) -> Self;
}
```

I also did a pass over the docs to adjust them, because this is no longer an "experiment". The `ptr` docs now discuss the concept of provenance in general, and then they go into the two families of APIs for dealing with provenance: Strict Provenance and Exposed Provenance. I removed the discussion of how pointers also have an associated "address space" -- that is not actually tracked in the pointer value, it is tracked in the type, so IMO it just distracts from the core point of provenance. I also adjusted the docs for `with_exposed_provenance` to make it clear that we cannot guarantee much about this function, it's all best-effort.

There are two unstable lints associated with the strict_provenance feature gate; I moved them to a new [strict_provenance_lints](rust-lang/rust#130351) feature since I didn't want this PR to have an even bigger FCP. ;)

`@rust-lang/opsem` Would be great to get some feedback on the docs here. :)
Nominating for `@rust-lang/libs-api.`

Part of rust-lang/rust#95228.

[FCP comment](rust-lang/rust#130350 (comment))
@kornelski
Copy link
Contributor

This proposal may be relevant here:

https://internals.rust-lang.org/t/pre-rfc-core-simulate-realloc/21745

@RalfJung
Copy link
Member

#130350 landed so this can be closed. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-strict-provenance Area: Strict provenance for raw pointers C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
Status: Idea
Development

No branches or pull requests