-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: No (opsem) Magic Boxes #3712
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Clarify the constraint o the invariant in footnote Co-authored-by: Jacob Lifshay <[email protected]>
It feels odd that one of the clear options is left out: why not expose a Like, I definitely agree that I agree that whatever happens shouldn't be specific to the |
I, at least, fully expect us to eventually have some way of writing alignment-obeying raw pointers in Rust in some way. If nothing else, (Transmuting between EDIT: added a word to try to communicate that I wasn't expecting this RFC to include such a type. |
That's my hope for the future as well, but to avoid the RFC becoming too cluttered, I am refraining from defining such a type in this RFC. |
Is there a list of optimisations that depend on noalias being emitted for Box’es? |
The RFC seems pretty clear that noalias hasn't really provided many benefits compared to being an extra burden to uphold for implementers, but maybe it is worth seeing if there are any sources that can provide a bit more detail on that. |
text/3712-box-yesalias.md
Outdated
* A pointer with an address that is not well-aligned for `T` (or in the case of a DST, the `align_of_val_raw` of the value), or | ||
* A pointer with an address that offsetting that address (as though by `.wrapping_byte_offset`) by `size_of_val_raw` bytes would wrap arround the address space | ||
|
||
The [`alloc::boxed::Box<T>`] type shall be laid out as though a `repr(transparent)` struct containing a field of type `WellFormed<T>`. The behaviour of doing a typed copy as type [`alloc::boxed::Box<T>`] shall be the same as though a typed copy of the struct `#[repr(transparent)] struct Box<T>(WellFormed<T>);`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The urlo definition is probably good.
It's defined in the opsem, but I don't know if we have a very good written record of that other than spread arround zulip threads and github issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan of this. I think that people moving from Vec<T>
to Box<[T]>
having to deal with drastically-different soundness rules is a giant footgun, and getting rid of the special [ST]B behaviour here sounds good to me.
My general take: The two "endpoints" here are
From what I can tell, we current orient If I could go back in time, I think I would favor end-user abstractions and offer different types (e.g., |
The RFC would benefit from some attempt to quantify the impact on performance, though our lack of standardized runtime benchmarks makes that hard. |
@rust-lang/opsem: We were curious in our discussions, does this RFC represent an existing T-opsem consensus? |
It does not represent any FCP done by T-opsem, which is why I've included them here. The claims I make, including those about the impact on the operation semantics, are included in the request for comment and consensus.
I recall some perf PR's (using the default rustc-perf suite) being done to determine the impact, which showed negligible impact. I can probably pull them up at some point in the RFC's lifecycle.
|
|
||
(Note that we do not define this type in the public standard library interface, though an implementation of the standard library could define the type locally) | ||
|
||
The following are not valid values of type `WellFormed<T>`, and a typed copy that would produce such a value is undefined behaviour: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Reference has been adjusted a while ago to state validity invariants positively, i..e by listing what must be true, instead of listing what must not be false. IMO that's more understandable, and the RFC should be updated to also do that.
There are patterns of using a custom per-
It's "every LLVM optimization that looks at alias information". The question is how much that matters in practice, which is hard to evaluate.
As Connor said, not in any formal sense. Several opsem members have expressed repeatedly that they want to see My own position is that I love how this simplifies the model and Miri, I am just slightly concerned about this being an irreversible loss of optimization potential that we might regret later. Absence of evidence of optimization benefit is not evidence of absence. Our benchmarks likely just don't have a lot of functions that take Is there a way we can query the ecosystem for functions taking |
- While the easiest alternative is to do nothing and maintain the status quo, as mentioned this has suprisingly implications both for the operational semantics of Rust | ||
- Alternative 2: Introduce a new type `AlisableBox<T>` which has the same interface as `Box<T>` but lacks the opsem implications that `Box<T>` has. | ||
- This also does not help remove the impact on the opsem model that the current `Box<T>` has, though provides an ergonomically equivalent option for `unsafe` code. | ||
- Alternative 3: We maintain the behaviour only for the unparameterized `Box<T>` type using the `Global` allocator, and remove it for `Box<T,A>` (for any allocator other than `A`), allowing unsafe code to use `Box<T, CustomAllocator>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually the status quo, since rust-lang/rust#122018
Just to follow up on some of the discussion, it wasn't immediately clear to me that types similar to I would still love if there's more data showing the lack of returns on |
…C++ counterpointer `std::unique_ptr`, in the prior art section
FTR, speaking right now as one of the main developers of lccc, my opinion is that the best way to mitigate any loss of future optimization potential is to just be far more granular with
I mentioned |
And more than them just not having them, IIRC someone tried to implement |
would still love if there's more data showing the lack of returns on noalias optimisations, since it feels wrong that something with a lot of history and usage isn't helping that much
It helps for references. I suspect people added it for Box because "why not".
|
There’s always an option of having a |
That would have to be a weaker |
FTR, I don't like this argument - whether or not its true, it has no bearing on what is undefined behaviour in Rust. None of the proposed aliasing models for Rust that I've seen are "Exactly Part of the point of this RFC is removing special cases in the memory model, so IMO it's completely against the proposal to add even more special-cased rules to SB or TB to handle something closer to what |
Rust doesn't yet have an aliasing model, only several WIP proposals. If there's some good benefit from having |
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
For the remainder of this section, let `WellFormed<T>` designate a type for *exposition purposes only* defined as follows: | ||
```rust | ||
#[repr(transparent)] | ||
struct WellFormed<T: ?Sized>(core::ptr::NonNull<T>); | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any differences semantically between what is proposed for Box<_>
in this RFC and what would be true of MaybeDangling<Box<_>>
today?
That is, since we already accepted RFC 3336, if there is no daylight between these, then it should say that normatively, and it perhaps could even lean into that for defining the semantics.
cc @RalfJung
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't spelled out whether MaybeDangling<Box<_>>
would allow pointers too close to the end of the address space, but it seems reasonable to say "no" to that. In that case the answer to your question is yes, this RFC proposes to weaken Box
so that its validity requirements are equivalent to MaybeDangling<Box<_>>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any differences semantically between what is proposed for
Box<_>
in this RFC and what would be true ofMaybeDangling<Box<_>>
today?
Restating Ralf's answer to this, MaybeDangling
only removes aliasing invariants, but preserves the normal validity invariants of the type. This RFC proposes to remove the aliasing invariants from Box<_>
as a whole, so it would naturally leave it in an identical state to MaybeDangling<Box<_>>
.
However, this also spells out that validity invariant in full, which we cannot rely on existing types yet to do.
In the case of allocators[^3], without special handling of them in the language as well, the protectors assigned to `Box<T>` were violated by (almost) any non-trivial allocator that provides the storage itself (without delegating to something like `malloc` or `alloc::alloc::alloc`). This is because the allocators access the same memory that the `Box` stores to mark it as deallocated and available again. In an extreme example, the same memory could even be yielded back to another `Allocator::allocate` call. Solving this requires special casing `Allocator`s, which is a heavily unresolved discussion, only applying the special opsem behaviour to `Global`, which is opaque via the global allocator functions, or forgoing custom allocators for `Box` entirely (thus depriving anyone needing to use a custom allocator from the user-visible language features `Box` alone provides). With the exception of the former, which is desired for other optimization reasons though [heavily debated and not resolved](https://github.com/rust-lang/unsafe-code-guidelines/issues/442), these solutions are merely solving the symptom, not the problem. | ||
|
||
Any `unsafe` code that may want to temporarily maintain aliased `Box<T>`s for various reasons (such as low-level copy operations), or may want to use something like `ManuallyDrop<Box<T>>`, is put into an interesting position: While they can use a user-defined smart pointer, this requires both care on the part of the smart pointer implementor, but also affects the ergonomics and expressiveness of that code, as `Box<T>` has many special language features that surface at the syntax level, which cannot be replicated today by a user-defined type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problems described in these motivation items are also solved by MaybeDangling
, no?
Either way, the motivation should be adjusted to describe this. That is, it's confusing for the motivation to be written as though we did not already cover this ground and accept RFC 3336. If that RFC did solve the problem, but the idea is that the present proposal solves it better for Box<_>
somehow, then that should be described here. Or alternatively, if there's some way in which RFC 3336 did not solve the problem, then that should be detailed specifically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the last one. unless we say that in order to use a custom allocator with Box
, you have always have to wrap the box in MaybeDangling
(in which case, I'm not sure how to create one in the first instance).
|
||
In the case of `ManuallyDrop<T>`, because `Box<T>` asserts aliasing validity on a typed copy, and is invalidated on drop, it introduces unique behaviour - `ManuallyDrop<Box<T>>` *cannot* be moved after calling `ManuallyDrop::drop` *even* to trusted code known not to access or double-drop the `Box`. No other type in the language or library has the same behaviour[^2], as primitive references do not have any behaviour on drop (let alone behaviour that includes invalidating themselves), and only `Box<T>`, references, and aggregates containing references are retagged on move. There are proposed solutions on the `ManuallyDrop<T>` type, such as treating specially by supressing retags on move, but this is a novel idea (as `ManuallyDrop<T>` asserts non-aliasing validity invariants on move), and it would interfere with retags of references without justification. The proposed complexity is only required because of `Box<T>`. | ||
|
||
In the case of allocators[^3], without special handling of them in the language as well, the protectors assigned to `Box<T>` were violated by (almost) any non-trivial allocator that provides the storage itself (without delegating to something like `malloc` or `alloc::alloc::alloc`). This is because the allocators access the same memory that the `Box` stores to mark it as deallocated and available again. In an extreme example, the same memory could even be yielded back to another `Allocator::allocate` call. Solving this requires special casing `Allocator`s, which is a heavily unresolved discussion, only applying the special opsem behaviour to `Global`, which is opaque via the global allocator functions, or forgoing custom allocators for `Box` entirely (thus depriving anyone needing to use a custom allocator from the user-visible language features `Box` alone provides). With the exception of the former, which is desired for other optimization reasons though [heavily debated and not resolved](https://github.com/rust-lang/unsafe-code-guidelines/issues/442), these solutions are merely solving the symptom, not the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't properly explain why "(almost) any non-trivial allocator" violates noalias
.
There is a class of non-trivial custom allocators that violates Stacked Borrows, specifically if it accesses "metadata memory" that is stored outside the region returned by the allocator, using the pointer that was passed in to deallocate
. If that's what you mean, it should be stated explicitly. Is that really "almost any non-trivial allocator"? That seems like a strong claim. It took a while for Miri to run into this issue.
With Tree Borrows, at least some of these cases are not UB any more, since Tree Borrows supports the &Header
pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took a while for Miri to run into this issue.
Most people likely don't use custom allocators with miri.
Note that this is talking about applying So if we want to avoid surprising inconsistencies between |
To explain a bit more what I mean: there's a huge set of possible aliasing requirements we could attach to various types. For instance:
One can't mix and match arbitrary combinations of such models in the same language, but Tree Borrows and something The first two are definitely not an option for
I'm not a fan of 1b, it allows code that IMO we want to forbid. 1a has the downside that we have two non-trivial alias requirements mixed into our model, which is a cost that should be justified -- and currently we are lacking the data to provide solid justification here. That said, it is entirely possible that in 3 years time, examples justifying this will shows up... it is really hard to get a convincing negative result in this space. |
I still maintain that much of the benefits we'll get will arise from more granular handling of &mut T and &T in codegen, rather than further special-casing the memory model. |
The combination of aliasing models is not as simple as you present, as currently references have at least noalias guarantees (the exact requirements are of course undecided). I would like to say that unlike I am broadly in favor of unifying |
Interesting. I would like to repeat this analysis, especially now that we do have a (small) runtime test suite. Subject to those results, I'm in favor of this change. There are measurable costs to this annotation, like unicode-org/icu4x#2095 (comment) and preventing use with C interop, in addition to the cognitive overhead. If, after reasonably trying, we can't come up with evidence of a real benefit, we shouldn't keep carrying the cost indefinitely just because that evidence could arise at some point in the future. I do think some of the comments on the RFC thread need to be addressed though. |
I don't know what you mean by this. We gave some thought to a weaker form of aliasing, more
It hasn't said that for very long though, see https://doc.rust-lang.org/1.65.0/std/boxed/struct.Box.html. If we can bugfix the
Given that we don't even know of an example of real-world code that uses |
It's perhaps worth noting that this particular case is accepted under Tree Borrows: // MIRIFLAGS="-Zmiri-tree-borrows" cargo miri run
fn main() {
let b = Box::new(0);
let p = &raw const *b;
let _b = b;
_ = unsafe { *p };
} Also, we separately accepted a fix for this in RFC 3336 ("MaybeDangling"): // cargo miri run
fn main() {
let b = MaybeDangling::new(Box::new(42));
let p = &raw const *b;
let _b = b;
_ = unsafe { **p };
} |
That is good to know. Does it not violate I think the answer should be "if it doesn't violate Tree Borrows it definitely doesn't violate any backend-specific annotations we place during codegen", but I also want to understand why that's true.
Just an observation: Despite reviewing that RFC, it took me quite some time to remember why it makes any sense that (The answer, for those following along, is that |
No.
Indeed,
Mostly it's true because Tree Borrows was carefully designed to satisfy this property, making some reasonable guesses for what these |
Of note, |
That frankly seems like an incredible footgun, given that the latter is intended to be turned into a pointer. |
It'd be interesting to see whether it is an actual hazard or not in practice, and if so, what those patterns are. We could of course, if needed, backward-compatibly represent it internally as |
Yes, but this big fix was widely understood as making the current status of the docs reflect reality, and not as an introduction of
This statement does not seem to jive with the fact that The main point of my comment is that making In simpler words, the status quo is |
This statement does not seem to jive with the fact that Vec has well documented as_mut_ptr() methods that do not materialize references.
I don't see what that has to do with the current discussion. Even if we add noalias, code using the as_mut_ptr in the way covered by the docs will keep being sound. If anything, the fact that we have explicit methods to express the intent of working with a vector "raw" (also includes into_raw_parts) makes it more clear that we do not make any promises beyond that.
|
I think we should add |
This is a significant leap to make from that comment, which acknowledges that its benchmarks are based on slices not Box. The point has been made repeatedly in this RFC that if users want to recover If people are very confident they can make a better language for a specific domain, that sounds great. Rust cannot be the best language for every application, and I would be quite worried if programming language innovation halted with Rust. |
Summary
Currently, the operational semantics of the type
alloc::boxed::Box<T>
is in dispute, but the compiler adds llvmnoalias
to it. To support it, the current operational semantics models have the type use a special form of theUnique
(Stacked Borrows) orActive
(Tree Borrows) tag, which has aliasing implications, validity implications, and also presents some unique complications in the model and in improvements to the type (e.g. Custom Allocators). We propose that, for the purposes of the runtime semantics of Rust,Box
is treated as no more special than a user-defined smart pointer you can write today1. In particular, it is given similar behaviour on a typed copy to a raw pointer.Rendered
Footnotes
We maintain some trivial validity invariants (such as alignment and address space limits) that a user cannot define, but these invariants only depend upon the value of the
Box
itself, rather than on memory. ↩