-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packing pointers into double-word width atomics #517
Comments
Yeah this is unfortunately not supported currently. A similar situation arises with types like IMO the best solution is to have some way to do atomic operations on |
That's possible. It's also a stability hazard for user-defined types so idk if that is desirable 🤔 you could have an explicit |
And then a OTOH it's not a new stability hazard, it is exactly the same hazard as |
Whilst I agree that there is a broader problem of atomically manipulating types that can't transmute to the containing atomic due to padding etc, my point was more about maintaining pointer provenance: something that When packing a pointer into a larger atomic, there is no equivalent to |
@eggyal There is a hierarchy: usize -> *mut -> MaybeUninit Where stuff to the right can store everything that stuff to the left can. @RalfJung is proposing tackling the atomic manipulation of generic types, which would include MaybeUninit, and so would automatically cover all the other use-cases, whereas an AtomicDoublePtr type would solve the provenance issue, but wouldn't allow you to atomically manipulate eg. types with padding, since padding may be uninitialized. |
Ah, my misunderstanding. I hadn't appreciated that provenance is propagated through uninitialized bytes! |
hmm, would that work? Given that the tuple is larger than |
Uninitialized bytes do not have provenance. However,
That's why I said "if it has the right size and both fields are I don't know which rules you had in mind, since this is not a structural property. But clearly if the concern is future compatibility, then to use |
There is however a classic issue with atomic operations on types with padding -- how should I am sure we already discussed this at some point, but I am not sure if we have an issue for that... @chorman0773 do you remember? So in that sense the situation of having two pointers is simpler, since there are no padding bytes. OTOH, by virtue of this being pointers, the issue in #480 would come up but now there'd be two provenances to worry about... |
IMO there's only two sane options, either it should be UB to do the comparison, or it should be specified that input values are frozen before being compared, with spurious failures being the logical consequence of that. |
That wouldn't have a forward progress guarantee as the compiler can always freeze it to the wrong value if the freeze happens as part of the compare_exchange. Freeze only requires the output of the freeze to be consistent between uses of the same freeze, not for it to be consistent between freezes. https://llvm.org/docs/LangRef.html#freeze-instruction
|
I think that's unavoidable if the compare/exchange loop happens in user code, since I don't think there's any guarantee that padding bytes are preserved? Plausibly |
If the user freezes themself once before the compare exchange loop rather than delegate it to compare_exchange there is no forward progress problem, but if compare_exchange does it, the compiler is allowed to produce an infinite loop. |
Depends on how its done.
If the value is frozen into an aligned [u8;N] before being
stored/compare_exchanged, then it will be frozen in memory.
…On Thu, Jul 18, 2024 at 10:18 bjorn3 ***@***.***> wrote:
I think that's unavoidable if the compare/exchange loop happens in user
code, since I don't think there's any guarantee that padding bytes are
preserved?
If the user freezes themself once before the compare exchange loop rather
than delegate it to compare_exchange there is no forward progress problem,
but if compare_exchange does it, the compiler is allowed to produce an
infinite loop.
—
Reply to this email directly, view it on GitHub
<#517 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD243JNHTNGFBCKWX5STZM7FD5AVCNFSM6AAAAABLBGWXCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWGY2TMNZWGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
If the |
You'd get a new `current` value though, so you could repeat.
…On Thu, 18 Jul 2024 at 10:56, bjorn3 ***@***.***> wrote:
If the current argument has padding, the compiler can always freeze the
padding such that it doesn't match the actual value in the atomic, and thus
force the compare_exchange to fail.
—
Reply to this email directly, view it on GitHub
<#517 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGLD2ZSFELSXPH744M7BM3ZM7JRLAVCNFSM6AAAAABLBGWXCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWG44DSMZYG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Right, if a MaybeUninit is returned, that would indeed work. |
Or |
FTR, the way xlang defines compare_exchange is that any atomic operation will freeze the padding bytes (and only the padding bytes) in any value it stores, then compare_exchange and any compound assignment is UB when the backing memory contains uninit bytes. |
Could this operation instead be implemented only for types that lack padding? The user could always add an explicit "padding field" to fill out any padding (and for example initialise it to zero). Of course in that case you would need something like a compiler implemented marker trait for "contains no padding" so perhaps that is more work. |
That makes it unsound to turn an But indeed, then we can have the invariant that an |
It would, yes. That's the downside here. |
What about something like this? static FOO: AtomicSet<(AtomicUsize, AtomicPtr)> = AtomicSet::new((0, null()));
FOO.compare_exchange(...)
|
This could in the future be extended neatly to support accessing the inner atomics as well, depending on how mixed size atomics in #345 gets decided (but either case is obviously not blocked on the other, just a nice potential future synergy). |
Hello, what's the reason compiler cannot just mask out (on assembly level) padding bytes/bits before atomic store? Applying mask on a word or two introduces a performance penalty? Or has problems with orderings/llvm? |
For union/enum types, the padding bytes are not statically known.
|
In my experience, when working with atomic structs, |
But how can we "replace our intended result with the thing we just read"? crossbeam's However, can we be assured that |
|
Respectfully, |
Ah of course... 🤦 Crossbeam anyway does something else -- the relevant part is For a primitive operation supporting compare-exchange on any |
If the values have to be frozen for compare/exchange to work, then shouldn't the type be just edit: |
If the bytes can carry provenance, an array of |
Of course 🤦 I think it's very confusing and inconsistent that Why is padding treated differently than provenance? If MaybeUninit is supposed to be exactly like T except that it allows uninitialized bytes, then it should not preserve provenance. If on the other hand MaybeUninit is supposed to act like a "bag of bytes" then it should also preserve padding. |
Padding is a property of the type - provenance is a property of a byte. |
It's definitely confusing, and the underlying reason is that computers are terrible. :( When we promised that That's a separate discussion, though: #518 |
I think I'm still not grasping something here. If, for compare-exchange of an
It therefore seems to be the case that Obviously byte-wise comparison may fail due to undef/padding bytes, but that is why one must then (as crossbeam does) call I suppose I'm struggling to understand why it would ever be necessary to |
I'm not sure how can we require struct Foo(u8);
impl Eq for Foo;
impl PartialEq for Foo {
fn eq(&self, other: &Foo) -> bool {
std::thread::sleep_ms(1000);
self.0.eq(&other.0)
}
}
fn foo(foo: &Atomic<Foo>) {
foo.compare_exchange(Foo(3), Foo(4), Relaxed, Relaxed);
} |
The library API |
The problem is that "reattempt with newly discovered However, after re-reading the thread, I personally find the proposed solutions unsatisfying. And in this case, the problem being solved is purely an artifact of compiler optimizations. In general, uninit bytes are not just an artifact of optimizations because of But I don't see any straightforward way to 'fix' this either, not without adding an ugly special case that LLVM and GCC may or may not end up respecting – that is, if they ever get around to adding optimizations for atomics. Right now they basically don't optimize them at all. |
Wouldn't the "min" variation of this be to only support the no-padding case, and leave the door open to relax that in the future? Would it be useful to move forward with such a proposal, or do the motivating use cases need padding? If so, why can't they just add manual "padding fields"? This seems to me to be entirely about ergonomics: yes it is nicer to support padding, but there is no use case that can't be rewritten to avoid needing padding. |
It's difficult if not impossible to support explicit padding with generics. |
@eggyal it's a bit hard to reply since you haven't given concrete code. Let me try:
The issue is that It's a somewhat theoretical concern because of course real uninit bytes are not unstable in this way, but if programmers want to reason about things like "does this concurrent program ever make progress", then we have to design a spec that doesn't have such "implicit" infinite loops, not even theoretically. This can either be fixed by somehow saying that |
Another option would be to atomically write back the frozen value after a failed compare-exchange. But usually a failed compare-exchange is a read-only operation so making it a full RMW could cause problems (such as extra data races).
|
I'm probably missing something straightforward. What about defining This obviously doesn't quite match actual hardware, and in some cases may result in non-ideal code (on a type with padding bits the compiler may have to lower |
That wouldn't be able to guarantee forward progress, right? The CAS loop may cause compare_exchange itself to never return if another thread constantly changes the padding while the CAS loop is running. |
Some implementations may have forward progress guarantee issues with this approach - but it at least allows the memory model itself to be sane. That being said, I was under the assumption that underlying implementations could already punt forward progress guarantees for individual atomics ops, and we tended to mostly gloss over this as implementation details. Consider ARM for a second (the architecture I'm most familiar with personally). ARM without LSE uses a LDREX/STREX (or LDX/STX in A64) (load-linked / store-conditional is the more common term) loop for many basic atomic operations, which does not guarantee forward progress in... well, among many many many other cases, precisely the case you indicate. Thread B doing repeated writes to a memory location (even if it doesn't change the value!) can cause thread A doing pretty much any atomic implemented as a LDX/STX loop to said memory location to stall indefinitely. ARM tried to have more forward progress guarantees for LDX/STX, but has walked them back over time (I suspect due to the bewildering variety of errata on older ARM processors of the form "if thread A does XXX then thread B may not make forward progress" - ARM's response has tended to be mainly 'well don't do that then' and a spec update to further restrict the cases in which LDX/STX are guaranteed to make forward progress). If this isn't an issue, then defining compare_exchange / etc to succeed if equal ignoring padding bits shouldn't be an issue either. If this is an issue... what do other architectures using a load-linked store-conditional (RISCV, notably) guarantee & do? |
Yeah the LL/SC thing was discussed recently somewhere but I can't find the discussion right now -- arguably that makes ARM a non-conforming implementation in terms of liveness (or "forward progress") properties. I don't think it is a good idea to add more such issues to the language -- in particular, I don't think it is a good idea to add operations that are impossible to implement in a liveness-preserving way even on hardware with stronger guarantees than ARM. "It's theoretically broken in certain conditions on some hardware" is not a good argument for "well then let's give up on it entirely". Regarding ignoring padding bytes, this has been discussed above already. |
Under Using Strict Provenance, the standard library documents:
(I've not been able to locate a discussion similar to the following, and assume that this is the best place to reply to that invitation—but please do point me elsewhere if not!).
Some concurrent algorithms (I am particularly thinking of some epoch-based memory reclamation schemes) solve the ABA problem by packing both a pointer and an integer (e.g. an epoch) into a single (double-word width) atomic variable (e.g. a 128-bit atomic, manipulated using
cmpxchg16b
, on x86_64 architectures).To do this with the existing atomic types requires using the appropriate double-word width atomic integer (e.g.
AtomicU128
on 64-bit architectures), and converting the relevant single-word width subpart to/from a pointer.If such pointer may be to different allocated objects over the course of the algorithm, it clearly is not possible to derive provenance in those conversions via strict provenance's
.with_addr()
method (because any "base" pointer from which provenance is derived would also have to be updated atomically with any update to this atomic).I am not sure what the best solution to this might be. Perhaps something like an
AtomicDoublePtr<T, U>
type or similar?Somewhat related to #286 and #480.
The text was updated successfully, but these errors were encountered: