-
Notifications
You must be signed in to change notification settings - Fork 7
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Just a few minor issues.
src/atomic/atomic_cell.rs
Outdated
} | ||
|
||
if step < 5 { | ||
// Just try again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always use spin_loop_hint
if you are in a spin loop.
src/atomic/atomic_cell.rs
Outdated
#[inline] | ||
fn validate_read(&self, stamp: usize) -> bool { | ||
atomic::fence(Ordering::Acquire); | ||
self.state.load(Ordering::SeqCst) == stamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The load here can be Relaxed
: the ordering is already enforced by the fence.
In terms of ordering: we only need to ensure that state.load
is not reordered before the volatile_read
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SeqCst
is here just to make the operation SeqCst
like the atomic one without locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure if that's enough to make the operation equivalent to SeqCst
.
cc @jeehoonkang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning is:
If this SeqCst
load returns a version that is not locked and is equal to the version before reading, then we've read a value that is still the same as the value present at the exact moment this SeqCst
load happened. So this load operation is when the AtomicCell::load
really "atomically" happens.
But yeah, I wonder what @jeehoonkang thinks, too. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sequence lock (stamped lock) is... strange. I think the best introduction to the seqlock and its strangeness is Hans Boehm's slide. I think we should implement the "version 3" in this slide, as (1) it's correct, as far as Hans (and I) think, and (2) it's the most efficient scheme among those implementable in C/C++.
According to the slide, (1) this load can be Relaxed
, and (2) the write-lock should issue a Release
fence. (2) is necessary even if this load is SeqCst
: otherwise, bad reorderings may happen on the writer side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The high-level explanation is that two consecutive load(SeqCst) are not strictly ordered, while Mutex<Cell> guarantees two consecutive reads are strictly ordered.
Just to make sure I understood, can you also answer whether the following statements are true or false?
- Two consecutive
load(SeqCst)
are strictly ordered by the C++ standard. - Two consecutive
load(SeqCst)
are strictly ordered on x86-64. - Two consecutive
load(SeqCst)
are not strictly ordered on ARM.
The release fence should be issued at the end of write_lock(), not inside write_unlock().
So we just need to change the ordering of the swap
operation inside write()
from Acquire
to AcqRel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe what I said was not clear. Sorry. By "consecutive loads" I mean ordered by (extended) coherence, not by intra-thread program order. Suppose two threads A and B are reading the same value from an AtomicCell<T>
. If it's lock-free, two reads are not ordered at all in C++, ARMv8, and ARMv7. (In x86, probably the order doesn't matter at the beginning.) But if it's based on a spinlock, then two loads are strictly ordered, either A's read before B's read or the other way around.
Regarding seqlock, acquire and release fence should be issued. AFAIK, in "version 3", W
issues a release fence right after updating the stamp value in write_lock()
, and R
issues an acquire fence just before reading the stamp value in the validation phase. If R
reads any value in the protected data written by W
, then there's a acquire/release synchronization between those fences, forcing R
to read the stamp value updated by W
or later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's lock-free, two reads are not ordered at all in C++, ARMv8, and ARMv7.
Doesn't at least the C++ standard guarantee that two SeqCst
reads are always ordered (sequential consistency)?
W issues a release fence right after updating the stamp value in write_lock()
Did you mean "right before"? (so that writes happen before the stamp value is updated)
If R reads any value in the protected data written by W, then there's a acquire/release synchronization between those fences
Why not make the swap
operation by W
use AcqRel
so that this operation establishes acquire/release synchronization with the fence issued by R
? That might be slightly faster than issuing a fence by W
. Or is that not going to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
SeqCst
loads and stores have very vague meaning in C++. As a theorem (not normative), C++ guarantees that if all atomic accesses areSeqCst
and there's no data race for non-atomic locations in a program, then the program's semantics is sequentially consistent (so-called "DRF1" or "DRF-SC"; here "DRF" stands for data-race freedom). But when even a single access to an irrelevant location is e.g.Relaxed
, then the program's semantics may no longer be sequentially consistent.In short, the semantics of
SeqCst
loads and stores is surprisingly weaker than expected. In particular, it's strictly weaker than spinlock-protected accesses. For the latest discussion on this subject, the RC11 paper is worth reading. -
I meant "right after", not "right before". Also, release swap will not work. (Yes, seqlock is strange...) Because the synchronization occurs via data variables, not stamp variable: writer issues release fence and then writes a value to data, and if reader reads the new value from data and then issues acquire fence, then after that the reader should read the new stamp value written by the writer (or a later one).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But when even a single access to an irrelevant location is e.g. Relaxed, then the program's semantics may no longer be sequentially consistent.
Whoa, that's really surprising!
By the way, in the previous pull request you said:
I think SeqCst obscures the specification: what do we expect from the methods of AtomicCell? I think the word "sequentially consistent" doesn't really help, because practically none understand the term precisely. Even worse, using SeqCst may guarantee some ordering properties which users will rely on, after which we're stuck and cannot optimize it further.
I'd just like to point out that someone might rely on the fact that spinlock-protected AtomicCell
has sequentially consistent semantics, but then we might switch from spinlock to e.g. AtomicU16
(once it gets into stable Rust) and thus relax the semantics behind the scenes. So that's a similar problem.
Anyway... I pushed some changes to this PR:
- The load in the validation step now uses
Relaxed
rather thanSeqCst
. - Added a
Release
fence inwrite
.
Does the code look good now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think it's good to go.
@Amanieu what do you think?
|
||
// Try doing an optimistic read first. | ||
if let Some(stamp) = lock.optimistic_read() { | ||
// We need a volatile read here because other threads might concurrently modify the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From an extremely pedantic view (which I don't agree that much), volatile read is still UB in C/C++. But everyone is using volatile read/write to mark an access concurrent, including Linux for starters. Maybe commenting on it would be helpful to readers or ASAN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a more elaborate comment explaining the issue.
LGTM |
When using global locks, two concurrent load operations on the same
AtomicCell
will contend on the same lock, which severely hurts performance. This PR changes the load operations to use optimistic reads instead. The idea is basically the same as in theseqlock
crate.Benchmarks:
Note the difference in
load_u8
andconcurrent_load_u8
- that's what this PR is about.