Atomics and multi core support #5

japaric · 2017-10-31T18:13:28Z

The current implementation of ring_buffer::{Consumer,Producer} only works on single core systems because it's missing memory barriers / fences. The proper way to add those barries is to change the types of the head and tail fields in RingBuffer from usize to AtomicUsize.

We are not doing that right now because that change would make us drop support for ARMv6-M, which is one of the main use cases (single core microcontrollers). If rust-lang/rust#45085 was implemented we could do the change to AtomicUsize while still supporting ARMv6-M.

I think the options are: (a) wait for rust-lang/rust#45085 or (b) provide a Cargo feature that switches the implementation to atomics to enable multi-core support (seems error prone: you could forget to enable the feature -- the other option is to make multi-core support a default feature but default features are hard, and sometimes impossible, to disable which hinders ARMv6-M support).

The text was updated successfully, but these errors were encountered:

pftbest · 2017-10-31T18:33:13Z

provide a Cargo feature that switches the implementation to atomics

Or you can just do #[cfg(target_has_atomic = "ptr")] instead of a cargo feature.

japaric · 2017-10-31T18:59:14Z

That's a very good idea. We can't still claim full multi-core support with that implementation though because multi-core Cortex-M microcontrollers are a thing: probably a Cortex-M0 + Cortex-Mx combination already exists out there and the cfg implementation will use usize on the Cortex-M0 core. Nonetheless, a cfg implementation is as good as we can do for now.

cc #5

pftbest · 2017-10-31T20:44:59Z

I've looked again at your implementation, and it looks like it's not correct even for single core CPUs.

There is no guaranteed ordering between this two operations:

            unsafe { ptr::write(buffer.get_unchecked_mut(rb.tail), item) }
            rb.tail = next_tail;

So compiler has the right to generate this code instead:

            rb.tail = next_tail;
            unsafe { ptr::write(buffer.get_unchecked_mut(rb.tail), item) }

To prevent that you need to insert a compiler fence between this two operations, so they won't be reordered. Something like this should be sufficient, I think (But I would have also made rb.tail = next_tail; a volatile operation, because there might be some other reordering lurking around.)

japaric · 2017-10-31T20:54:54Z

Hmm, I don't think the compiler can do that kind of reordering since it changes the value passed to get_unchecked_mut; at least not in this context where a &mut rb.tail (i.e. no aliasing) implicitly exists due to the rb.tail = assignment.

@pftbest

use atomics where available cc #5 cc @pftbest

pftbest · 2017-10-31T21:00:19Z

Oh, and similar thing with reads,

             let item = unsafe { ptr::read(buffer.get_unchecked(*head)) };
            *head = (*head + 1) % n;

It might increment the head first and then do read. Atomic operations guarantee ordering, but volatile (and regular) operations does not.

since it changes the value passed to get_unchecked_mut

I think this doesn't matter. It can save the old value in register, do assignment and only then do write to a location from register.

But you are right, my example code is not valid, it should have looked like:

            eax = rb.tail;
            rb.tail = next_tail;
            unsafe { ptr::write(buffer.get_unchecked_mut(eax), item) }

which is what I meant by my original comment.

@pftbest

use atomics where available cc #5 cc @pftbest

japaric · 2017-10-31T21:16:41Z

I don't follow. Basically you are saying that the compiler can do this kind of reordering (swap A and B) in this code as well:

fn foo(slice: &[u32], head: &mut usize) {
    let n = slice.len();
    let item = slice[*head]; // A
    *head = (*head + 1) % n;  // B
}

(Which is equivalent to the last code you quoted.)

But that doesn't occur in practice; if it did a bunch of safe Rust would break. Adding an unsafe block doesn't make this reordering possible AFAIK.

pftbest · 2017-10-31T21:24:18Z

Why not? I can just swap them, like this:

fn foo(slice: &[u32], head: &mut usize) {
    let n = slice.len();
    let tmp = *head;
    *head = (*head + 1) % n;  // B
    let item = slice[tmp]; // A
}


Oh, I see it can panic, so someone who catches panic can observe that head changed. But `get_unchecked_mut` doesn't panic

@pftbest

use atomics where available cc #5 cc @pftbest

pftbest · 2017-10-31T21:32:08Z

But I still can get around the panic like this:

fn foo(slice: &[u32], head: &mut usize) {
    let n = slice.len();
    let tmp = *head;
    if tmp >= n {
        panic!("...");
    }
    *head = (*head + 1) % n;  // B
    let item = slice[tmp]; // A
}

Now they should be identical

japaric · 2017-10-31T21:49:05Z

OK, I saw the modified read example and I see what you mean now. Sorry, I didn't realize earlier -- I'm already tired. Indeed, if head gets updated before before the data load then an interrupt between the two ops can overwrite the memory slot; I expect something similar can happen in the write case. A compiler barrier should do the trick I think. I'm off for today I'll probably review this again in two days.

pftbest · 2017-10-31T21:58:49Z

then an interrupt between the two ops can overwrite the memory slot

Yes, exactly. Sorry if I was unclear. I just burned myself a lot by writing similar code in C.

@pftbest

use atomics where available cc #5 cc @pftbest

@pftbest

use atomics where available cc #5 cc @pftbest

@pftbest

use atomics where available cc #5 cc @pftbest

@pftbest

use atomics where available cc #5 cc @pftbest

pftbest · 2017-11-09T13:36:52Z

Just tested the latest code from git using tsan and it complains about data race:
https://gist.github.com/pftbest/183d7850a3f4535b7e5f9343929f46ad

Here is the code that I used:
https://gist.github.com/pftbest/3642ffba88e7117094bb2053d27637a1

I think the reason why it fails is your recent commit 9faea68

If you look at some other popular implementations like boost
https://github.com/boostorg/lockfree/blob/develop/include/boost/lockfree/spsc_queue.hpp#L267

And linux kernel:
https://github.com/torvalds/linux/blob/master/Documentation/circular-buffers.txt#L200

They both do acquire at least for consumer side.

I think it maybe OK to lift the acquire for producer side like they did in linux, because producer doesn't read any data so it doesn't need it. But removing acquire for consumer side is not good, because it is the one thing that provides the actual synchronization. One thread(core) releases the data in buffer and the other thread must acquire it before reading, or else it can read the old value.

pftbest · 2017-11-09T14:12:47Z

Tested without this commit and still get the data race, so there might be something else that I don't see.

japaric · 2017-11-09T14:17:26Z

I tested locally and I can repro. I changed both loads to acquire and the problem remains. What I find strange is that the report mentions that some backtrace (?) operation is contending with the consumer / producer:

  Previous write of size 4 at 0x56253944a018 by thread T1:
    #0 std::sys_common::backtrace::__rust_begin_short_backtrace hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa58e)
    #1 std::panicking::try::do_call hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa803)
    #2 std::panicking::try::do_call hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa803)
    #3 std::panicking::try::do_call hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa803)
    #4 std::panicking::try::do_call hello0-1a838b0df7296149462a69d816ab3a3b.rs:? (hello+0xa803)

But maybe the backtrace is just pointing to the wrong place.

pftbest · 2017-11-09T14:42:57Z

Found why it fails even if I revert the commit, you had the head and tail mixed up:

https://github.com/japaric/heapless/blob/9533e27f44b21eb331188074771610eb9dae661b/src/ring_buffer/spsc.rs#L46

You release tail in producer, but then acquire head in consumer

pftbest · 2017-11-09T14:52:54Z

Just tested this change against master:

diff --git a/src/ring_buffer/mod.rs b/src/ring_buffer/mod.rs
index 3f53646..abee440 100644
--- a/src/ring_buffer/mod.rs
+++ b/src/ring_buffer/mod.rs
@@ -36,6 +36,10 @@ impl AtomicUsize {
     pub fn store_release(&self, val: usize) {
         unsafe { intrinsics::atomic_store_rel(self.v.get(), val) }
     }
+
+    pub fn load_acquire(&self) -> usize {
+        unsafe { intrinsics::atomic_load_acq(self.v.get()) }
+    }
 }
 
 /// An statically allocated ring buffer backed by an array `A`
diff --git a/src/ring_buffer/spsc.rs b/src/ring_buffer/spsc.rs
index 488c07a..082aab7 100644
--- a/src/ring_buffer/spsc.rs
+++ b/src/ring_buffer/spsc.rs
@@ -45,7 +45,7 @@ where
         let n = rb.capacity() + 1;
         let buffer: &[T] = unsafe { rb.buffer.as_ref() };
 
-        let tail = rb.tail.load_relaxed();
+        let tail = rb.tail.load_acquire();
         let head = rb.head.load_relaxed();
         if head != tail {
             let item = unsafe { ptr::read(buffer.get_unchecked(head)) };

It doesn't report any races when N is smaller than buffer size and reports write after read when N > buffer size. But write after read should be OK since it's a circular buffer after all so we can allow overwriting data.

fluffysquirrels · 2017-12-18T17:48:04Z

For now I think it would be nice to add to the module-level documentation that this only works on single-core CPU's.

pftbest · 2017-12-18T17:54:20Z

@fluffysquirrels I think this issue was fixed in v0.2.0. No need to update documentation

fluffysquirrels · 2017-12-18T18:03:33Z

@pftbest, OK, nice. In which case this issue can be closed. I think it would still be nice to add a line to the ring_buffer module documentation saying what guarantees are provided (i.e. single producer and consumer, as checked by implementing the Send trait but not the Sync trait).

japaric · 2018-04-08T19:24:08Z

This was fixed in v0.2.0.

japaric added a commit that referenced this issue Oct 31, 2017

use atomics where available

55f891e

cc #5

japaric mentioned this issue Oct 31, 2017

use atomics where available #8

Merged

japaric pushed a commit that referenced this issue Oct 31, 2017

Auto merge of #8 - japaric:atomic, r=<try>

4a527af

use atomics where available cc #5 cc @pftbest

japaric pushed a commit that referenced this issue Oct 31, 2017

Auto merge of #8 - japaric:atomic, r=<try>

b519403

use atomics where available cc #5 cc @pftbest

japaric pushed a commit that referenced this issue Oct 31, 2017

Auto merge of #8 - japaric:atomic, r=<try>

5fcb8c4

use atomics where available cc #5 cc @pftbest

japaric pushed a commit that referenced this issue Nov 8, 2017

Auto merge of #8 - japaric:atomic, r=<try>

b31015c

use atomics where available cc #5 cc @pftbest

japaric pushed a commit that referenced this issue Nov 8, 2017

Auto merge of #8 - japaric:atomic, r=<try>

14a5689

use atomics where available cc #5 cc @pftbest

japaric pushed a commit that referenced this issue Nov 9, 2017

Auto merge of #8 - japaric:atomic, r=<try>

89e5733

use atomics where available cc #5 cc @pftbest

japaric pushed a commit that referenced this issue Nov 9, 2017

Auto merge of #8 - japaric:atomic, r=japaric

612bf44

use atomics where available cc #5 cc @pftbest

pftbest mentioned this issue Nov 9, 2017

add contention test #9

Merged

fluffysquirrels mentioned this issue Dec 18, 2017

Ignore this: documented that the current implementation will only work correctly on single-core devices #12

Closed

japaric closed this as completed Apr 8, 2018

bradjc mentioned this issue May 24, 2018

Move deferred call mechanism to kernel tock/tock#935

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Atomics and multi core support #5

Atomics and multi core support #5

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017 •

edited

Loading

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017 •

edited

Loading

pftbest commented Oct 31, 2017 •

edited

Loading

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017

pftbest commented Nov 9, 2017

pftbest commented Nov 9, 2017

japaric commented Nov 9, 2017

pftbest commented Nov 9, 2017

pftbest commented Nov 9, 2017 •

edited

Loading

fluffysquirrels commented Dec 18, 2017

pftbest commented Dec 18, 2017

fluffysquirrels commented Dec 18, 2017

japaric commented Apr 8, 2018

Atomics and multi core support #5

Atomics and multi core support #5

Comments

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017 • edited Loading

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017 • edited Loading

pftbest commented Oct 31, 2017 • edited Loading

japaric commented Oct 31, 2017

pftbest commented Oct 31, 2017

pftbest commented Nov 9, 2017

pftbest commented Nov 9, 2017

japaric commented Nov 9, 2017

pftbest commented Nov 9, 2017

pftbest commented Nov 9, 2017 • edited Loading

fluffysquirrels commented Dec 18, 2017

pftbest commented Dec 18, 2017

fluffysquirrels commented Dec 18, 2017

japaric commented Apr 8, 2018

pftbest commented Oct 31, 2017 •

edited

Loading

pftbest commented Oct 31, 2017 •

edited

Loading

pftbest commented Oct 31, 2017 •

edited

Loading

pftbest commented Nov 9, 2017 •

edited

Loading