-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic (but undefined) layout #35
Comments
Key comments from #11:
@gnzlbg writes (in response to @asajeffrey):
|
Sorry if I missed any. |
So I would like to pose the counter-question: Why should this be guaranteed? |
I am +1 with @the8472 . My experience of C is that these kinds of limitations get annoying as time goes by, because newer ideas and compilers are limited by the old restrictions. I'd actually be in favour of the compiler gaining randomisation, because (in my experience) even if something is technically allowed to vary, if it doesn't tend to change then people start to expect it being fixed. Also randomisation seems (to me) a possibly cheap and effective anti-hacking technique -- if I can compile a program with a randomised order for every struct, it will be much harder for someone to make a repeatable hack, even if they find a suitable bug, as they won't know the layout of anything. Of course it isn't perfect by any means, but it's another layer of defense and I would want to see a good reason to throw it away. If someone wants a well-defined fixed ABI, they can always use repr(C). |
Deterministic layout seems appealing,especially if it is determined by the fields,not the generic parameters, Even if layout is not deterministic,there should be ways to recover back some determinism through attributes. One example attribute would be '#[phantomtype(C)]' which would guarantee that the generic parameters it lists cannot affect the layout of the type,only allowing PhantomData/types with the same attribute to mention those generic parameters.
The
The
|
That's quite a special case that does not generalize well to the myriad of heterogeneous structs with multiple non-ZST fields. It would fall under #34 already. |
One example where it doesn't apply? |
Well, it would still be determined by some of its generic parameters, |
I stand by my exact request; I'm fine with field re-ordering being the exact thing allowed. |
@gankro couldn't that prevent some optimizations, e.g. reducing alignment (and thus size) when there are no references are taken on a private member? |
@the8472 that's legal under as-if |
@gankro We are discussing here whether we can say that layout is deterministic (with some list of inputs that may affect layout, and nothing else may). You are talking about constraining what layout may do, irrespective of its inputs. That's an entirely orthogonal discussion. @ubsan "as-if"? @rkruppe also writes in the PR:
Even with PGO, we should be able to guarantee that given the same PGO profile data, we compute the same layout. I do not know how PGO works, but I assume there's a "collection" phase spitting out some data into a file and then that file is used by the next compilation? So that file would be part of the input to the deterministic function. |
@RalfJung even with the same input couldn't slight changes to the program plus optimization fuel change which structs get optimized and which don't? |
I'm a big fan of randomized layout personally. I'd much more happily assert
that layouts can be randomized rather than try to offer any guarantees
about determinism besides those that we view are required for
interoperability.
…On Sat, 13 Oct 2018 at 08:04 the8472 ***@***.***> wrote:
@RalfJung <https://github.com/RalfJung> even with the same input couldn't
slight changes to the program plus optimization fuel change which structs
get optimized and which don't?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#35 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AT4HVShyH7idIV5u8KaR7A2yKuVTT1rfks5ukdbHgaJpZM4XYCOx>
.
|
We might not want to think of the source code "written by the user" as part of the input to this deterministic function, but only think about the Rust that's left after macro expansion for this. If build scripts, proc macros, etc. have non-determinism, then that just produces a different "program".
It doesn't have to be either or. I sometimes, during debugging, want fully deterministic builds to reliably reproduce an issue, and other times, also during debugging, want fully randomized layouts to see if one bug is caused by some layout assumption. This screams "implementation-defined", but that does not mean that the spec cannot say anything more about it. The spec could say that: "layout is implementation defined but must be defined in one of the following ways: ..." leaving it to the implementation to pick one (e.g. depending on build flags), but not allowing implementations to invent new ones. In any case, it might be interesting to come up with "definitions" for the relevant cases that we might want to have, and then reconsider what to do here. If we decide to make it defined to one of these, or implementation defined, then we just pick one or some of the definitions, and if we leave it as unspecified, then we drop them. |
I guess @RalfJung is aiming for reproducible builds, correct? Maybe the solution is to make struct layout conditionally deterministic. Preconditions: no optimization fuel limit, no randomization, ??? Some Inputs can also be shifted to preconditions. I.e. we can simply say it's only deterministic if certain optimization flags are disabled. |
Builds should be reproducible, yes (not sure if that is a sufficient condition though). I got nothing against a flag to randomize stuff for debugging, but we shouldn't do that in normal release builds.
So make fuel part of the input. Fuel changes the game in annoying ways anyway, similar to PGO actually, in taht adding new structs can change the layout of the existing ones. So if our definition is ready for PGO then fuel shouldn't cause extra problems, and if we decide we don't want to support PGO then our rule should say it only holds when fuel is not used. Both seem like reasonable options to me. |
Another option for determinism is hashing the fully qualified name of the struct and using that as input for layout. That way a struct could retain a its layout between builds but two different structs with the same members would end up with different layouts, thus discouraging punning between |
@RalfJung no I'm saying I am fine with it being non-deterministic as long as the non-determinism is constrained to field-reordering, with an otherwise simple and deterministic (same as repr(C)) alignment/padding algorithm. |
Something which is worth separating is the behaviour of the compiler under default / "sensible" settings vs what is EVER allowed. I can understand someone not wanting field randomisation in their build, so they can tune ordering and achieve greater performance. However, do we want to ban field randomisation from ever being valid in a Rust implementation? Banning it from ever being valid feels like it would need an extremely good reason. |
The goal is establishing what guarantees can unsafe code rely on - these guarantees must always hold; compiler settings cannot break these guarantee. There does not seem to be a consensus about whether we want to allow Rust implementations to randomize the field order of |
With regards alignment, when using SSE instructions I often use the macro Restricting the size to be no bigger than |
@gnzlbg we have said a lot about why the compiler might want to change layout and why a dev might want stable layout for debuggability. But very little has been said why unsafe code would need some specific guarantee. @rodrimati1992 brought one example, punning between the same generic type with different phantom generic arguments. Without an annotation signalling that intent this would be ok. Without such an annotation it would prevent the compiler from making different optimization choices for different uses of monomorphized types. What other uses do we have? @gankro never stated how unsafe code would benefit from his requests for example. |
An observation. If we did the following:
but also included compiler settings in that equation, then we could still randomize layout, but we would be restricted that two structs with the same definition must still wind up with the same layout (which might be in any order etc). In any case, I am getting more attracted to the idea of pulling out certain special cases (e.g., single-field structs in #34) and leaving the broader question for later. |
Rather than a |
Do you know |
I saw the stream, and in @jonhoo 's situation |
That depends whether the standard library wants to make that promise. I don't know if it's willing to commit to never tweaking how it's stored. Perhaps it will, and the only things that will need the method are things like O(1) |
@scottmcm my |
Another example of a potentially useful transmute is between |
This was already discussed in previous comments. That transmute is incorrect in general (for arbitrary T) due to |
This is trying to address the unsoundness that arises from the current version of `ShallowCopy` (see #74). In the process, it also deals with the fact that casting between `Inner<.., T, ..>` and `Inner<.., ManuallyDrop<T>, ..>` is likely not sound (rust-lang/unsafe-code-guidelines#35 (comment)). It does not yet work.
This is trying to address the unsoundness that arises from the current version of `ShallowCopy` (see #74). In the process, it also deals with the fact that casting between `Inner<.., T, ..>` and `Inner<.., ManuallyDrop<T>, ..>` is likely not sound (rust-lang/unsafe-code-guidelines#35 (comment)). It does not yet work.
This is trying to address the unsoundness that arises from the current version of `ShallowCopy` (see #74). In the process, it also deals with the fact that casting between `Inner<.., T, ..>` and `Inner<.., ManuallyDrop<T>, ..>` is likely not sound (rust-lang/unsafe-code-guidelines#35 (comment)). It does not yet work.
This is trying to address the unsoundness that arises from the current version of `ShallowCopy` (see #74). In the process, it also deals with the fact that casting between `Inner<.., T, ..>` and `Inner<.., ManuallyDrop<T>, ..>` is likely not sound (rust-lang/unsafe-code-guidelines#35 (comment)). It does not yet work.
This implementation gets rid of the unsound `ShallowCopy` trait (#74 & rust-lang/unsafe-code-guidelines#35 (comment)), and replaces it with a wrapper type around aliased values. The core mechanism is that the wrapper type holds a `MaybeUninit<T>`, and aliases it by doing a `ptr::read` of the whole `MaybeUninit<T>` to alias. It then takes care to only give out `&T`s, which is allowed to alias. To drop, the implementation sets a thread-local that the wrapper uses to determine if it should truly drop the inner `T` (i.e., to indicate that this is the last copy, and the `T` is no longer aliased). The removal of `ShallowCopy` makes some of the API nicer, and obviates the need for `evmap-derive`. Unfortunately the introduction of the wrapper also complicates certain bounds which now need to know about the wrapping. Overall though, an ergonomic win. The implementation passes all tests, and I _think_ it is sound (if you think it's not, please let me know!). The one bit that I'm not sure about is the thread-local if a user nests `evmap`s (since they'll share that thread local). Note that this does _not_ take care of #78. Fixes #74. Fixes #55 since `ShallowCopy` is no longer neeeded.
Continuing down this path, what about a transmute from SomeWrapper<MyType<T, MyPrivateTypeA>> to SomeWrapper<MyType<T, MyPrivateTypeB>> where I could be mistaken, but I think there is now no way to have an impl outside the current crate that distinguishes between the two types, and therefore the transmute should be safe? |
While |
Hmm, that's unfortunate indeed. And it doesn't sound like you think there's any chance of that being the case subject to the discussion on this issue either without placing restrictions on the wrapper type? Essentially, I'm trying to find a way to use the type system to change the drop behavior of a type that is nested inside another data structure, but without being able to safely transmute the type "inside" the data structure, I cannot find a way to do so. Which leads me down the path of horrible hacks like https://github.com/jonhoo/rust-evmap/blob/4e83da000cfee94810eb790f25c1dc159def1b12/evmap/src/aliasing.rs#L79-L94. I was hoping there might be a way to construct the inner types in such a way that the layout of the wrapper type is guaranteed to stay the same (making the transmute safe), but it sounds like that's not the case.. hmm hmm hmm what to do. EDIT: I tried to summarize my understanding of the discussion here over in jonhoo/left-right#83 (comment) in case it helps anyone who's trying to understand this issue. EDIT2: A sketch of what I want to do in my implementation is jonhoo/left-right@sound-aliasing...drop-by-generic-type (the commit message has more details). |
This patch avoids the thread-local to choose dropping behavior by instead typecasting among private generic types for the inner value type. It's not clear if that's actually sound: rust-lang/unsafe-code-guidelines#35 (comment) The current implementation does not compile due to a conflict between the need for the concrete drop behavior types to be private and the need for the (public) bound: Aliased<T, crate::aliasing::NoDrop>: Borrow<_> I don't quite know how to work around that yet. What we really want is to write: for<U: DropBehavior> Aliased<T, U>: Borrow<_> But I don't think that's current expressible? Maybe through some more trait magic?
Can you reach into the data structure to touch the values? If so you could try exploiting the niche optimization instead of using the type system. It would come at a runtime cost but at least you wouldn't be suffering size overhead to carry that state as long as types have niches. E.g. enum DropToggle<T> {
On(T),
Off(ManuallyDrop<T>)
} |
In this case the wrapper type is a collection type (worse yet, a collection of collections), so while I could walk every value to toggle the value, that operation is quite expensive as it is proportional to the size of the collection, even if only one item actually ends up being dropped. Think of something like calling |
Prior art related to the current discussion: Haskell's type roles and the Data.Coerce library (referenced in the safe transmute RFC). |
So, you are asking for some amount of "parametricity" on the type level. That sounds like a very reasonable request. To make this work, we need to at least ensure that the two types also behave identically for every trait resolution query, to avoid associated type shenanigans. More precisely speaking, for any trait resolution query, replacing one of the types by the other anywhere in the query (including in nested positions, where clauses, whatever) must not affect the outcome of the query. Your approach of using private types that implement no traits seems to be a nice step in that direction. Unfortunately I don't have enough intuition for specialization to really say if that is enough. Outside of the trait system, I think our current layout algorithm is indeed parametric in that sense -- it cares about a few extensional properties of the type (size, alignment, niches), but as long as those remain the same, changing the type will have no effect. The main question here is if we want to make this a guarantee that will be maintained in the future. (That's what this issue was originally about.) |
So, my haskell is weak, but if I understand type roles document correctly then it is primarily about representing the difference between exact type equality vs. layout equality based on generic parameters in the type system and then goes on to suggest that exact (norminal) type equality should be the default for all types and layout equivalence should require explicit annotations. In other words they suggest that by default there should be no promises about layout equality.
Technically the initial comment only asks whether it is a deterministic function of a fixed set of inputs. Those inputs could contain the exact type (nominal equality in the type roles document) or a random input. Questions that I think need to be answered
|
This implementation gets rid of the unsound `ShallowCopy` trait (#74 & rust-lang/unsafe-code-guidelines#35 (comment)), and replaces it with a wrapper type around aliased values. The core mechanism is that the wrapper type holds a `MaybeUninit<T>`, and aliases it by doing a `ptr::read` of the whole `MaybeUninit<T>` to alias. It then takes care to only give out `&T`s, which is allowed to alias. To drop, the implementation casts between two different generic arguments of the new `Aliased` type, where one type causes the `Aliased<T>` to drop the inner `T` and the other does not. The documentation goes into more detail. The resulting cast _should_ be safe (unlike the old `ManuallyDrop` cast). The removal of `ShallowCopy` makes some of the API nicer by removing trait bounds, and obviates the need for `evmap-derive`. While I was going through, I also took the liberty of tidying up the external API of `evmap` a bit. The implementation passes all tests, and I _think_ it is sound (if you think it's not, please let me know!). Note that this does _not_ take care of #78. Fixes #74. Fixes #55 since `ShallowCopy` is no longer neeeded. Also fixes #72 for the same reason.
This implementation gets rid of the unsound `ShallowCopy` trait (#74 & rust-lang/unsafe-code-guidelines#35 (comment)), and replaces it with a wrapper type around aliased values. The core mechanism is that the wrapper type holds a `MaybeUninit<T>`, and aliases it by doing a `ptr::read` of the whole `MaybeUninit<T>` to alias. It then takes care to only give out `&T`s, which is allowed to alias. To drop, the implementation casts between two different generic arguments of the new `Aliased` type, where one type causes the `Aliased<T>` to drop the inner `T` and the other does not. The documentation goes into more detail. The resulting cast _should_ be safe (unlike the old `ManuallyDrop` cast). The removal of `ShallowCopy` makes some of the API nicer by removing trait bounds, and obviates the need for `evmap-derive`. While I was going through, I also took the liberty of tidying up the external API of `evmap` a bit. The implementation passes all tests, and I _think_ it is sound (if you think it's not, please let me know!). Note that this does _not_ take care of #78. Fixes #74. Fixes #55 since `ShallowCopy` is no longer neeeded. Also fixes #72 for the same reason.
From #31: Can we say that layout is some deterministic function of a certain, fixed set of inputs? This would allow you to be sure that if you do not alter those inputs, your struct layout would not change, even if it meant that you can't predict precisely what it will be. For example, we might say that struct layout is a function of the struct's generic types and its substitutions, full stop -- this would imply that any two structs with the same definition are laid out the same. This might interfere with our ability to do profile-guided layout or to analyze how a struct is used and optimize based on that. (Some would call that a feature.)
Also, this presumably applies to enums as well as other types.
The text was updated successfully, but these errors were encountered: