use atomics where available #8

japaric · 2017-10-31T20:42:05Z

cc #5
cc @pftbest

cc #5

japaric · 2017-10-31T20:56:54Z

@homunkulus try

homunkulus · 2017-10-31T20:57:00Z

⌛ Trying commit f9a3dfc with merge 4a527af...

@pftbest

use atomics where available cc #5 cc @pftbest

homunkulus · 2017-10-31T20:59:58Z

☀️ Test successful - status-travis
State: approved= try=True

japaric · 2017-10-31T21:01:47Z

@homunkulus try

homunkulus · 2017-10-31T21:01:54Z

⌛ Trying commit 158d19b with merge b519403...

@pftbest

use atomics where available cc #5 cc @pftbest

japaric · 2017-10-31T21:06:36Z

SUMMARY: ThreadSanitizer: data race /checkout/src/liballoc/heap.rs:104 in _$LT$alloc..heap..Heap$u20$as$u20$alloc..allocator..Alloc$GT$::dealloc::h42931c7bdbec6994

Hmm, looks like the Travis environment doesn't do demangling for some reason.

homunkulus · 2017-10-31T21:08:00Z

💔 Test failed - status-travis

japaric · 2017-10-31T21:28:42Z

@homunkulus try

homunkulus · 2017-10-31T21:28:49Z

⌛ Trying commit a93f857 with merge 5fcb8c4...

@pftbest

use atomics where available cc #5 cc @pftbest

homunkulus · 2017-10-31T21:31:53Z

💔 Test failed - status-travis

japaric · 2017-10-31T21:39:02Z

I tested running the sanitizer on a minimal Ubuntu 16.04 install and demangling works there. Then I remembered that Travis is using 14.04, which I haven't tested.

pftbest · 2017-11-02T13:38:06Z

All single core systems are supported.

I think this statement might be not exactly true. There are some single core systems that can execute instructions out-of-order, and can break this code.

Good news is that Cortex-M is not one of them yet:

The Cortex-M processors never perform memory accesses out of order compared to instruction flow, however, the architecture does not prohibit this in future implementations. ARMv7-M code written to be portable to ARMv7-AR processors, like Cortex-A9, must already take account of this ordering model.

I believe that such platforms like Cortex-A9 all have atomic support, so this wouldn't matter in practice, but running non-atomic code on them would probably be a mistake, even if they have only one core.

japaric · 2017-11-07T23:47:03Z

Good point, @pftbest. I think we should note that (out-of-order execution) in the documentation. I think it's best to ask the user to check the disassembly if they are using a single core system with "no atomics"; if the code is wrong the fix will involve patching the implementation for that specific architecture / target (i.e. #[cfg]) as I don't think there's a general way to insert instruction / memory barriers via LLVM intrinsics.

which works on thumbv6m-none-eabi and probably other targets with max-atomic-width = 0

japaric · 2017-11-08T21:14:59Z

I ended up re-implementing AtomicUsize using atomic load/store (LLVM) intrinsics. This way I was able to drop the conditional code while retaining support for thumbv6m-none-eabi.

pftbest · 2017-11-08T21:29:02Z

That's a great idea.

japaric · 2017-11-08T22:06:47Z

I changed the load_acquire for load_relaxed. The previous version produced two memory barriers; I think we only need one: the one between the data read / write and the the head / tail pointer update.

So, this code:

fn exti0() {
    unsafe {
        RB.split().0.enqueue(0).unwrap();
    }
}

Produced:

080001c0 <EXTI0>:
 80001c0:       f240 0000       movw    r0, #0
 80001c4:       f2c2 0000       movt    r0, #8192       ; 0x2000
 80001c8:       6803            ldr     r3, [r0, #0]
 80001ca:       6842            ldr     r2, [r0, #4]
 80001cc:       f3bf 8f5f       dmb     sy
 80001d0:       1c51            adds    r1, r2, #1
 80001d2:       f001 010f       and.w   r1, r1, #15
 80001d6:       4299            cmp     r1, r3
 80001d8:       bf1f            itttt   ne
 80001da:       eb00 0282       addne.w r2, r0, r2, lsl #2
 80001de:       2300            movne   r3, #0
 80001e0:       6093            strne   r3, [r2, #8]
 80001e2:       f3bf 8f5f       dmbne   sy
 80001e6:       bf1c            itt     ne
 80001e8:       6041            strne   r1, [r0, #4]
 80001ea:       4770            bxne    lr
 80001ec:       b580            push    {r7, lr}
 80001ee:       466f            mov     r7, sp
 80001f0:       f7ff ffe2       bl      80001b8 <core::result::unwrap_failed>

With the change, it now produces:

080001f0 <EXTI0>:
 80001f0:       f240 0000       movw    r0, #0
 80001f4:       f2c2 0000       movt    r0, #8192       ; 0x2000
 80001f8:       6803            ldr     r3, [r0, #0]
 80001fa:       6842            ldr     r2, [r0, #4]
 80001fc:       1c51            adds    r1, r2, #1
 80001fe:       f001 010f       and.w   r1, r1, #15
 8000202:       4299            cmp     r1, r3
 8000204:       bf1f            itttt   ne
 8000206:       eb00 0282       addne.w r2, r0, r2, lsl #2
 800020a:       2300            movne   r3, #0
 800020c:       6093            strne   r3, [r2, #8]
 800020e:       f3bf 8f5f       dmbne   sy
 8000212:       bf1c            itt     ne
 8000214:       6041            strne   r1, [r0, #4]
 8000216:       4770            bxne    lr
 8000218:       b580            push    {r7, lr}
 800021a:       466f            mov     r7, sp
 800021c:       f7ff ffcc       bl      80001b8 <core::result::unwrap_failed>

TSan seems to be OK with the change.

japaric · 2017-11-08T23:09:55Z

@homunkulus try

@pftbest

use atomics where available cc #5 cc @pftbest

homunkulus · 2017-11-08T23:10:02Z

⌛ Trying commit af5fdf3 with merge b31015c...

homunkulus · 2017-11-08T23:12:08Z

💔 Test failed - status-travis

japaric · 2017-11-08T23:26:12Z

@homunkulus try

homunkulus · 2017-11-08T23:26:19Z

⌛ Trying commit 7806bc9 with merge 14a5689...

@pftbest

use atomics where available cc #5 cc @pftbest

homunkulus · 2017-11-08T23:30:00Z

💔 Test failed - status-travis

japaric · 2017-11-09T00:38:13Z

@homunkulus try

homunkulus · 2017-11-09T00:38:20Z

⌛ Trying commit 5ff961c with merge 89e5733...

@pftbest

use atomics where available cc #5 cc @pftbest

homunkulus · 2017-11-09T00:42:10Z

☀️ Test successful - status-travis
State: approved= try=True

also - add a "`split` freezes the ring buffer" compile fail test - hide compile-fail doc tests - add scoped threads tests

japaric · 2017-11-09T01:28:42Z

@homunkulus r+

homunkulus · 2017-11-09T01:28:42Z

📌 Commit 30ea33c has been approved by japaric

homunkulus · 2017-11-09T01:28:49Z

⌛ Testing commit 30ea33c with merge 612bf44...

@pftbest

use atomics where available cc #5 cc @pftbest

homunkulus · 2017-11-09T01:39:43Z

☀️ Test successful - status-travis
Approved by: japaric
Pushing 612bf44 to master...

these changes optimize `Vec<u8, 1024>::clone` down to these operations 1. reserve the stack space (1028 bytes on 32-bit ARM) and leave it uninitialized 2. zero the `len` field 3. memcpy `len` bytes of data from the parent analyzed source code ``` rust use heapless::Vec; fn clone(vec: &Vec<u8, 1024>) { let mut vec = vec.clone(); black_box(&mut vec); } fn black_box<T>(val: &mut T) { unsafe { asm!("// {0}", in(reg) val) } } ``` machine code with `lto = fat`, `codegen-units = 1` and `opt-level = 'z'` ('z' instead of 3 to avoid loop unrolling and keep the machine code readable) ``` armasm 00020100 <clone>: 20100: b5d0 push {r4, r6, r7, lr} 20102: af02 add r7, sp, #8 20104: f5ad 6d81 sub.w sp, sp, #1032 ; 0x408 20108: 2300 movs r3, #0 2010a: c802 ldmia r0!, {r1} 2010c: 9301 str r3, [sp, #4] 2010e: aa01 add r2, sp, #4 20110: /--/-X b141 cbz r1, 20124 <clone+0x24> 20112: | | 4413 add r3, r2 20114: | | f810 4b01 ldrb.w r4, [r0], #1 20118: | | 3901 subs r1, #1 2011a: | | 711c strb r4, [r3, #4] 2011c: | | 9b01 ldr r3, [sp, #4] 2011e: | | 3301 adds r3, #1 20120: | | 9301 str r3, [sp, #4] 20122: | \-- e7f5 b.n 20110 <clone+0x10> 20124: \----> a801 add r0, sp, #4 20126: f50d 6d81 add.w sp, sp, #1032 ; 0x408 2012a: bdd0 pop {r4, r6, r7, pc} ``` note that it's not optimizing step (3) to an actual `memcpy` because we lack the 'trait specialization' code that libstd uses --- before `clone` was optimized to 1. reserve and zero (`memclr`) 1028 (!?) bytes of stack space 2. (unnecessarily) runtime check if `len` is equal or less than 1024 (capacity) -- this included a panicking branch 3. memcpy `len` bytes of data from the parent

290: optimize the codegen of Vec::clone r=japaric a=japaric these changes optimize `Vec<u8, 1024>::clone` down to these operations 1. reserve the stack space (1028 bytes on 32-bit ARM) and leave it uninitialized 2. zero the `len` field 3. memcpy `len` bytes of data from the parent analyzed source code ``` rust use heapless::Vec; fn clone(vec: &Vec<u8, 1024>) { let mut vec = vec.clone(); black_box(&mut vec); } fn black_box<T>(val: &mut T) { unsafe { asm!("// {0}", in(reg) val) } } ``` machine code with `lto = fat`, `codegen-units = 1` and `opt-level = 'z'` ('z' instead of 3 to avoid loop unrolling and keep the machine code readable) ``` armasm 00020100 <clone>: 20100: b5d0 push {r4, r6, r7, lr} 20102: af02 add r7, sp, #8 20104: f5ad 6d81 sub.w sp, sp, #1032 ; 0x408 20108: 2300 movs r3, #0 2010a: c802 ldmia r0!, {r1} 2010c: 9301 str r3, [sp, #4] 2010e: aa01 add r2, sp, #4 20110: /--/-X b141 cbz r1, 20124 <clone+0x24> 20112: | | 4413 add r3, r2 20114: | | f810 4b01 ldrb.w r4, [r0], #1 20118: | | 3901 subs r1, #1 2011a: | | 711c strb r4, [r3, #4] 2011c: | | 9b01 ldr r3, [sp, #4] 2011e: | | 3301 adds r3, #1 20120: | | 9301 str r3, [sp, #4] 20122: | \-- e7f5 b.n 20110 <clone+0x10> 20124: \----> a801 add r0, sp, #4 20126: f50d 6d81 add.w sp, sp, #1032 ; 0x408 2012a: bdd0 pop {r4, r6, r7, pc} ``` note that it's not optimizing step (3) to an actual `memcpy` because we lack the 'trait specialization' code that libstd uses --- before `clone` was optimized to 1. reserve and zero (`memclr`) 1028 (!?) bytes of stack space 2. (unnecessarily) runtime check if `len` is equal or less than 1024 (capacity) -- this included a panicking branch 3. memcpy `len` bytes of data from the parent Co-authored-by: Jorge Aparicio <[email protected]>

japaric added 3 commits October 31, 2017 21:26

use atomics where available

55f891e

cc #5

add tsan test

4a6bf95

also test in release

f9a3dfc

japaric pushed a commit that referenced this pull request Oct 31, 2017

Auto merge of #8 - japaric:atomic, r=<try>

4a527af

use atomics where available cc #5 cc @pftbest

actually execute ci/script.sh

158d19b

japaric pushed a commit that referenced this pull request Oct 31, 2017

Auto merge of #8 - japaric:atomic, r=<try>

b519403

use atomics where available cc #5 cc @pftbest

japaric pushed a commit that referenced this pull request Oct 31, 2017

Auto merge of #8 - japaric:atomic, r=<try>

5fcb8c4

use atomics where available cc #5 cc @pftbest

japaric force-pushed the atomic branch from a93f857 to 158d19b Compare October 31, 2017 21:37

japaric added 2 commits November 8, 2017 00:50

add a compiler barrier

978f0ee

create our own AtomicUsize

37c8b5b

which works on thumbv6m-none-eabi and probably other targets with max-atomic-width = 0

japaric added 2 commits November 8, 2017 22:51

test two consecutive operations

9533e27

load_acquire -> load_relaxed

9faea68

japaric pushed a commit that referenced this pull request Nov 8, 2017

Auto merge of #8 - japaric:atomic, r=<try>

b31015c

use atomics where available cc #5 cc @pftbest

japaric pushed a commit that referenced this pull request Nov 8, 2017

Auto merge of #8 - japaric:atomic, r=<try>

14a5689

use atomics where available cc #5 cc @pftbest

japaric added 2 commits November 9, 2017 01:37

work around rust-lang/rust#45802

9398aaf

tsan: deal with the mangled names

5ff961c

japaric force-pushed the atomic branch from 7806bc9 to 5ff961c Compare November 9, 2017 00:38

japaric pushed a commit that referenced this pull request Nov 9, 2017

Auto merge of #8 - japaric:atomic, r=<try>

89e5733

use atomics where available cc #5 cc @pftbest

japaric added 2 commits November 9, 2017 02:09

rewrite the test for less unsafety

731e8ae

relax the lifetime constraint of RingBuffer.split

30ea33c

also - add a "`split` freezes the ring buffer" compile fail test - hide compile-fail doc tests - add scoped threads tests

japaric pushed a commit that referenced this pull request Nov 9, 2017

Auto merge of #8 - japaric:atomic, r=japaric

612bf44

use atomics where available cc #5 cc @pftbest

homunkulus merged commit 30ea33c into master Nov 9, 2017

japaric deleted the atomic branch November 9, 2017 01:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use atomics where available #8

use atomics where available #8

japaric commented Oct 31, 2017

japaric commented Oct 31, 2017

homunkulus commented Oct 31, 2017

homunkulus commented Oct 31, 2017

japaric commented Oct 31, 2017

homunkulus commented Oct 31, 2017

japaric commented Oct 31, 2017

homunkulus commented Oct 31, 2017

japaric commented Oct 31, 2017

homunkulus commented Oct 31, 2017

homunkulus commented Oct 31, 2017

japaric commented Oct 31, 2017

pftbest commented Nov 2, 2017

japaric commented Nov 7, 2017

japaric commented Nov 8, 2017

pftbest commented Nov 8, 2017

japaric commented Nov 8, 2017

japaric commented Nov 8, 2017

homunkulus commented Nov 8, 2017

homunkulus commented Nov 8, 2017

japaric commented Nov 8, 2017

homunkulus commented Nov 8, 2017

homunkulus commented Nov 8, 2017

japaric commented Nov 9, 2017

homunkulus commented Nov 9, 2017

homunkulus commented Nov 9, 2017

japaric commented Nov 9, 2017

homunkulus commented Nov 9, 2017

homunkulus commented Nov 9, 2017

homunkulus commented Nov 9, 2017

use atomics where available #8

use atomics where available #8

Conversation

japaric commented Oct 31, 2017

japaric commented Oct 31, 2017

homunkulus commented Oct 31, 2017

homunkulus commented Oct 31, 2017

japaric commented Oct 31, 2017

homunkulus commented Oct 31, 2017

japaric commented Oct 31, 2017

homunkulus commented Oct 31, 2017

japaric commented Oct 31, 2017

homunkulus commented Oct 31, 2017

homunkulus commented Oct 31, 2017

japaric commented Oct 31, 2017

pftbest commented Nov 2, 2017

japaric commented Nov 7, 2017

japaric commented Nov 8, 2017

pftbest commented Nov 8, 2017

japaric commented Nov 8, 2017

japaric commented Nov 8, 2017

homunkulus commented Nov 8, 2017

homunkulus commented Nov 8, 2017

japaric commented Nov 8, 2017

homunkulus commented Nov 8, 2017

homunkulus commented Nov 8, 2017

japaric commented Nov 9, 2017

homunkulus commented Nov 9, 2017

homunkulus commented Nov 9, 2017

japaric commented Nov 9, 2017

homunkulus commented Nov 9, 2017

homunkulus commented Nov 9, 2017

homunkulus commented Nov 9, 2017