Use optimistic reads in AtomicCell #39

ghost · 2018-07-31T09:17:13Z

When using global locks, two concurrent load operations on the same AtomicCell will contend on the same lock, which severely hurts performance. This PR changes the load operations to use optimistic reads instead. The idea is basically the same as in the seqlock crate.

Benchmarks:

 name                    before ns/iter  after ns/iter  diff ns/iter   diff %  speedup
 compare_and_swap_u8     9               8                        -1  -11.11%   x 1.12
 compare_and_swap_usize  6               6                         0    0.00%   x 1.00
 concurrent_load_u8      25,239,685      716,611         -24,523,074  -97.16%  x 35.22
 concurrent_load_usize   355,220         346,668              -8,552   -2.41%   x 1.02
 fetch_add_u8            9               8                        -1  -11.11%   x 1.12
 fetch_add_usize         6               6                         0    0.00%   x 1.00
 load_u8                 8               1                        -7  -87.50%   x 8.00
 load_usize              0               0                         0     NaN%    x NaN
 store_u8                9               8                        -1  -11.11%   x 1.12
 store_usize             6               6                         0    0.00%   x 1.00

Note the difference in load_u8 and concurrent_load_u8 - that's what this PR is about.

Amanieu

This is great! Just a few minor issues.

Amanieu · 2018-07-31T09:22:51Z

src/atomic/atomic_cell.rs

+            }
+
+            if step < 5 {
+                // Just try again.


Always use spin_loop_hint if you are in a spin loop.

Amanieu · 2018-07-31T10:04:37Z

src/atomic/atomic_cell.rs

+    #[inline]
+    fn validate_read(&self, stamp: usize) -> bool {
+        atomic::fence(Ordering::Acquire);
+        self.state.load(Ordering::SeqCst) == stamp


The load here can be Relaxed: the ordering is already enforced by the fence.

In terms of ordering: we only need to ensure that state.load is not reordered before the volatile_read.

The SeqCst is here just to make the operation SeqCst like the atomic one without locks.

I'm not exactly sure if that's enough to make the operation equivalent to SeqCst.

cc @jeehoonkang

My reasoning is:

If this SeqCst load returns a version that is not locked and is equal to the version before reading, then we've read a value that is still the same as the value present at the exact moment this SeqCst load happened. So this load operation is when the AtomicCell::load really "atomically" happens.

But yeah, I wonder what @jeehoonkang thinks, too. :)

Sequence lock (stamped lock) is... strange. I think the best introduction to the seqlock and its strangeness is Hans Boehm's slide. I think we should implement the "version 3" in this slide, as (1) it's correct, as far as Hans (and I) think, and (2) it's the most efficient scheme among those implementable in C/C++.

According to the slide, (1) this load can be Relaxed, and (2) the write-lock should issue a Release fence. (2) is necessary even if this load is SeqCst: otherwise, bad reorderings may happen on the writer side.

@jeehoonkang

The high-level explanation is that two consecutive load(SeqCst) are not strictly ordered, while Mutex<Cell> guarantees two consecutive reads are strictly ordered.

Just to make sure I understood, can you also answer whether the following statements are true or false?

Two consecutive load(SeqCst) are strictly ordered by the C++ standard.

Two consecutive load(SeqCst) are strictly ordered on x86-64.

Two consecutive load(SeqCst) are not strictly ordered on ARM.

The release fence should be issued at the end of write_lock(), not inside write_unlock().

So we just need to change the ordering of the swap operation inside write() from Acquire to AcqRel?

Oh, maybe what I said was not clear. Sorry. By "consecutive loads" I mean ordered by (extended) coherence, not by intra-thread program order. Suppose two threads A and B are reading the same value from an AtomicCell<T>. If it's lock-free, two reads are not ordered at all in C++, ARMv8, and ARMv7. (In x86, probably the order doesn't matter at the beginning.) But if it's based on a spinlock, then two loads are strictly ordered, either A's read before B's read or the other way around.

Regarding seqlock, acquire and release fence should be issued. AFAIK, in "version 3", W issues a release fence right after updating the stamp value in write_lock(), and R issues an acquire fence just before reading the stamp value in the validation phase. If R reads any value in the protected data written by W, then there's a acquire/release synchronization between those fences, forcing R to read the stamp value updated by W or later.

If it's lock-free, two reads are not ordered at all in C++, ARMv8, and ARMv7.

Doesn't at least the C++ standard guarantee that two SeqCst reads are always ordered (sequential consistency)?

W issues a release fence right after updating the stamp value in write_lock()

Did you mean "right before"? (so that writes happen before the stamp value is updated)

If R reads any value in the protected data written by W, then there's a acquire/release synchronization between those fences

Why not make the swap operation by W use AcqRel so that this operation establishes acquire/release synchronization with the fence issued by R? That might be slightly faster than issuing a fence by W. Or is that not going to work?

SeqCst loads and stores have very vague meaning in C++. As a theorem (not normative), C++ guarantees that if all atomic accesses are SeqCst and there's no data race for non-atomic locations in a program, then the program's semantics is sequentially consistent (so-called "DRF1" or "DRF-SC"; here "DRF" stands for data-race freedom). But when even a single access to an irrelevant location is e.g. Relaxed, then the program's semantics may no longer be sequentially consistent.

In short, the semantics of SeqCst loads and stores is surprisingly weaker than expected. In particular, it's strictly weaker than spinlock-protected accesses. For the latest discussion on this subject, the RC11 paper is worth reading.

I meant "right after", not "right before". Also, release swap will not work. (Yes, seqlock is strange...) Because the synchronization occurs via data variables, not stamp variable: writer issues release fence and then writes a value to data, and if reader reads the new value from data and then issues acquire fence, then after that the reader should read the new stamp value written by the writer (or a later one).

@jeehoonkang

But when even a single access to an irrelevant location is e.g. Relaxed, then the program's semantics may no longer be sequentially consistent.

Whoa, that's really surprising!

By the way, in the previous pull request you said:

I think SeqCst obscures the specification: what do we expect from the methods of AtomicCell? I think the word "sequentially consistent" doesn't really help, because practically none understand the term precisely. Even worse, using SeqCst may guarantee some ordering properties which users will rely on, after which we're stuck and cannot optimize it further.

I'd just like to point out that someone might rely on the fact that spinlock-protected AtomicCell has sequentially consistent semantics, but then we might switch from spinlock to e.g. AtomicU16 (once it gets into stable Rust) and thus relax the semantics behind the scenes. So that's a similar problem.

Anyway... I pushed some changes to this PR:

The load in the validation step now uses Relaxed rather than SeqCst.

Added a Release fence in write.

Does the code look good now?

jeehoonkang

Thanks! I think it's good to go.

@Amanieu what do you think?

jeehoonkang · 2018-08-09T16:09:50Z

src/atomic/atomic_cell.rs

+
+            // Try doing an optimistic read first.
+            if let Some(stamp) = lock.optimistic_read() {
+                // We need a volatile read here because other threads might concurrently modify the


From an extremely pedantic view (which I don't agree that much), volatile read is still UB in C/C++. But everyone is using volatile read/write to mark an access concurrent, including Linux for starters. Maybe commenting on it would be helpful to readers or ASAN?

I added a more elaborate comment explaining the issue.

Amanieu · 2018-08-09T16:45:17Z

LGTM

ghost requested review from Amanieu and jeehoonkang July 31, 2018 09:17

Use optimistic reads in AtomicCell

2eb9ede

Amanieu approved these changes Jul 31, 2018

View reviewed changes

Stjepan Glavina added 2 commits July 31, 2018 16:27

Always use spin_loop_hint

937c387

Add a Release fence and relax the SeqCst load

8b8d406

jeehoonkang approved these changes Aug 9, 2018

View reviewed changes

Elaborate comments on volatile reads

85d9dce

ghost merged commit 5123ec4 into crossbeam-rs:master Aug 9, 2018

ghost deleted the optimistic-read branch August 9, 2018 17:23

ghost mentioned this pull request Sep 1, 2018

add AtomicOption crossbeam-rs/crossbeam#186

Closed

ghost mentioned this pull request Dec 10, 2018

Replace LockCell with atomic types rust-lang/rust#56614

Merged

RalfJung mentioned this pull request Jan 30, 2019

AtomicCell: Why do you use SeqCst? crossbeam-rs/crossbeam#317

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use optimistic reads in AtomicCell #39

Use optimistic reads in AtomicCell #39

ghost commented Jul 31, 2018

Amanieu left a comment

Amanieu Jul 31, 2018

Amanieu Jul 31, 2018

ghost Jul 31, 2018

Amanieu Jul 31, 2018

ghost Jul 31, 2018

jeehoonkang Jul 31, 2018 •

edited

Loading

ghost Aug 2, 2018

jeehoonkang Aug 2, 2018 •

edited

Loading

ghost Aug 2, 2018

jeehoonkang Aug 3, 2018 •

edited

Loading

ghost Aug 8, 2018 •

edited by ghost

Loading

jeehoonkang left a comment

jeehoonkang Aug 9, 2018

ghost Aug 9, 2018

Amanieu commented Aug 9, 2018

Use optimistic reads in AtomicCell #39

Use optimistic reads in AtomicCell #39

Conversation

ghost commented Jul 31, 2018

Amanieu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeehoonkang Jul 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeehoonkang Aug 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeehoonkang Aug 3, 2018 • edited Loading

Choose a reason for hiding this comment

ghost Aug 8, 2018 • edited by ghost Loading

Choose a reason for hiding this comment

jeehoonkang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Amanieu commented Aug 9, 2018

jeehoonkang Jul 31, 2018 •

edited

Loading

jeehoonkang Aug 2, 2018 •

edited

Loading

jeehoonkang Aug 3, 2018 •

edited

Loading

ghost Aug 8, 2018 •

edited by ghost

Loading