Flaky test pthreads_exit on ARM #2075

zhaoqin · 2016-11-25T02:05:16Z

<Starting application /home/zhaoqin/Workspace/DynamoRIO/nightly/run/build_debug-internal-32/suite/tests/bin/pthreads.pthreads_exit (17431)>
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/home/zhaoqin/Workspace/DynamoRIO/nightly/run/build_debug-internal-32/lib32/debug/libdynamorio.so' 0x710228d8
>
<Initial options = -no_dynamic_options -code_api -msgbox_mask 12 -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<(1+x) Handling our fault in a TRY at 0x71215b3c>
<Stopping application /home/zhaoqin/Workspace/DynamoRIO/nightly/run/build_debug-internal-32/suite/tests/bin/pthreads.pthreads_exit (17431)>
<Application /home/zhaoqin/Workspace/DynamoRIO/nightly/run/build_debug-internal-32/suite/tests/bin/pthreads.pthreads_exit (17431).  Internal Error: DynamoRIO debug check failure: /home/zhaoqin/Workspace/DynamoRIO/nightly/src/core/heap.c:1410 IF_WINDOWS(doing_detach || ) heapmgt->vmheap.num_free_blocks == heapmgt->vmheap.num_blocks - unfreed_blocks || (ever_beyond_vmm && heapmgt->vmheap.num_free_blocks >= heapmgt->vmheap.num_blocks - unfreed_blocks)
(Error occurred @1796 frags)
version 6.2.17127, custom build
-no_dynamic_options -code_api -msgbox_mask 12 -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct 
0x710d8cc1 0x78af04b0>

The text was updated successfully, but these errors were encountered:

zhaoqin · 2016-11-25T02:44:06Z

(gdb) p heapmgt->vmheap.num_free_blocks
$1 = 8180
(gdb) p heapmgt->vmheap.num_blocks     
$2 = 8192
(gdb) p/x heapmgt->vmheap.start_addr
$6 = 0x4e97c000
(gdb) p/x heapmgt->vmheap.blocks    
$7 = {0xfc3fc3ff, 0xffffffff, 0xffffffff, 0x3fffffff, 0xfffffffc, 0xffffffff <repeats 251 times>, 0x0 <repeats 768 times>}
(gdb) p/x heapmgt->vmheap           
$5 = {start_addr = 0x4e97c000, end_addr = 0x5697c000, alloc_start = 0x0, alloc_size = 0x8000000, num_blocks = 0x2000, lock = {lock_requests = 0xffffffff, 
    contended_event = 0xffffffff, name = 0x7128ddd0, rank = 0x4a, owner = 0x0, owning_dcontext = 0x0, prev_owned_lock = 0x0, count_times_acquired = 0xbb, 
    count_times_contended = 0x0, count_times_spin_pause = 0x0, max_contended_requests = 0x0, count_times_spin_only = 0x0, prev_process_lock = 0x7132d654, 
    next_process_lock = 0x7132d17c, callstack = {0x0, 0x0, 0x0, 0x0}, app_lock = 0x0, deleted = 0x0}, num_free_blocks = 0x1ff4, blocks = {0xfc3fc3ff, 0xffffffff, 
    0xffffffff, 0x3fffffff, 0xfffffffc, 0xffffffff <repeats 251 times>, 0x0 <repeats 768 times>}}

It is clear that block starting @ 0x4eb74000 is not freed.

vmm_heap_reserve_blocks: size=65536 blocks=4 p=0x4eb74000
vmm_heap_reserve: size=65536 p=0x4eb74000
os_heap_commit: 57344 bytes @ 0x4eb75000
heap_mmap: 8192 bytes [/ 8192] @ 0x4eb51000

zhaoqin · 2016-11-25T03:07:22Z

vmm_heap_reserve_blocks: size=65536 blocks=4 p=0x4e9a4000
vmm_heap_reserve: size=65536 p=0x4e9a4000
os_heap_commit: 57344 bytes @ 0x4e9a5000
...
vmm_heap_reserve_blocks: size=65536 blocks=4 p=0x4eb74000
vmm_heap_reserve: size=65536 p=0x4eb74000
os_heap_commit: 57344 bytes @ 0x4eb75000
...
new_thread_setup: thread 17799, dstack 0x4eb83000 clone record 0x4eb82bf0
(gdb) info threads
  Id   Target Id         Frame 
* 1    process 17788 "pthreads.pthrea" syscall_0_or_1arg () at /home/zhaoqin/Workspace/DynamoRIO/nightly/src/core/arch/arm/arm_shared.asm:87

So it is the memory for thread 17799's dstack is not freed.
Saw this line, which did not see in a passed run:

Thread 17799 reached initialization point while dynamo exiting, waiting for app to exit

So basically when a thread is newly created and in initialization phase, the app is exiting, the new thread simply waits without freeing its own dstack.
Xref code at: https://github.com/DynamoRIO/dynamorio/blob/master/core/dynamo.c#L2108
It seems that we can only relax the assertion check.

derekbruening · 2016-11-26T17:34:34Z

So this is a regression? When the thread init code was first written, the dstack was not allocated yet at this exit-check point. We changed it to allocate the dstack at the parent clone point so we could store data on it.

egrimley · 2017-03-01T12:51:51Z

I can get this test to fail on AArch64, too. With a debug build of 38920e6 on Fedora 24:

while bin64/drrun -code_api -- suite/tests/bin/pthreads.pthreads_exit ; do echo ; done

After a while:

... Internal Error: DynamoRIO debug check failure: .../dynamorio/core/utils.c:968 lock->lock_requests == LOCK_FREE_STATE
(Error occurred @1771 frags)

It's not the same error every time.

egrimley · 2017-03-02T12:04:50Z

On some AArch64 systems, at least, all of these are flaky:

  Test  #50: code_api|linux.threadexit
  Test  #51: code_api|linux.threadexit2
  Test  #67: code_api|pthreads.pthreads
  Test  #68: code_api|pthreads.pthreads_exit

egrimley · 2017-03-10T12:55:37Z

Here's the stack when the ASSERT in mutex_delete() fails:

#0  mutex_delete (lock=0x557a9e7260) at /.../dynamorio/core/utils.c:969
#1  0x000000557a63c8f8 in dynamo_shared_exit (toexit=0x0)
    at /.../dynamorio/core/dynamo.c:1086
#2  0x000000557a63cca8 in dynamo_process_exit_cleanup ()
    at /.../dynamorio/core/dynamo.c:1274
#3  0x000000557a63cea0 in dynamo_process_exit () at /.../dynamorio/core/dynamo.c:1330
#4  0x000000557a8dbc20 in cat_done_saving_dstack ()
    at /.../dynamorio/core/arch/aarch64/aarch64.asm:322

(gdb) p *lock
$8 = {lock_requests = 0, contended_event = -1, 
  name = 0x557a92beb8 "thread_initexit_lock(mutex)@/.../dynamorio/core/dynamo.c:260", 
  rank = 4, owner = 28134, owning_dcontext = 0x0, prev_owned_lock = 0x557a9e7818, 
  count_times_acquired = 166464950, count_times_contended = 55585188, 
  count_times_spin_pause = 0, max_contended_requests = 2, count_times_spin_only = 21, 
  prev_process_lock = 0x557a9e7890, next_process_lock = 0x557a9e7890, callstack = {0x0, 0x0, 
    0x0, 0x0}, app_lock = false, deleted = true}

derekbruening · 2017-03-10T16:53:25Z

So thread 28134 still holds the thread_initexit_lock -- what is the callstack of that thread?

egrimley · 2017-03-13T11:33:41Z

I don't seem to be able to reproduce this now. There's the same failure, but by the time I've managed to connect a debugger, owner is zero. However, I can still give you a backtrace for the other thread when that happens:

#0  dynamorio_syscall () at /.../dynamorio/core/arch/aarch64/aarch64_shared.asm:67
#1  0x0000005581c0de5c in get_sys_thread_id () at /.../dynamorio/core/unix/os.c:2605
#2  0x0000005581c0de9c in get_thread_id () at /.../dynamorio/core/unix/os.c:2620
#3  0x00000055819fe790 in deadlock_avoidance_lock (lock=0x5581d15260, acquired=true, ownable=true)
    at /.../dynamorio/core/utils.c:577
#4  0x00000055819ff3dc in mutex_lock (lock=0x5581d15260)
    at /.../dynamorio/core/utils.c:888
#5  0x000000558196c450 in dynamo_thread_init (dstack_in=0x51b03000 "", mc=0x51b025c0, 
    client_thread=false) at /.../dynamorio/core/dynamo.c:2161
#6  0x0000005581c09828 in new_thread_setup (mc=0x51b025c0)
    at /.../dynamorio/core/arch/x86_code.c:305
#7  0x0000007fa3bb3c3c in ?? ()
#8  0x0000007fa3d03c58 in ?? ()

derekbruening · 2017-08-19T00:55:13Z

This should have been fixed by #2600's counter, at least when a synchall is done at exit, which includes debug build: thus this assert should never fire anymore, and we can remove the flag added in 45dd931.

egrimley · 2017-11-02T09:06:59Z

Both on AArch64 and on x86_64, without DEBUG, pthreads_test sometimes runs for about a second, and sometimes for about a minute. The driver script reports the test as having passed, but is it really working as intended?

started_detach for the few checks that need pre-synch querying. Reminder to change to #2601 and to add a xref to #2075: we've now removed the dynamo_thread_init_during_process_exit flag that was added by 45dd931 as #2600's counter fixed that race.

To handle a thread exiting on attach, adds a timeout to wait_event() and its corresponding implementations: os_wait_event() on Windows and ksynch_wait() on UNIX. Uses the timeout to check whether a thread exited, and if so, to move on. Augments the test from #2600 to test this race. Abandon the api.detach_spawn test on Windows:i#2611 covers fixing the tricky problems on Windows. Leaves in place some fixes toward #2611: + Fixes a race where we put interception_code hooks in place before marking them +x + Increases MAX_THREADS_WAITING_FOR_DR_INIT Fixes clang 32-bit missing __moddi3 by adding it to third_party/libgcc and linking that into x86 and arm. To enable adding race checks, moves doing_detach inside the synchall and adds started_detach for the few checks that need pre-synch querying. Removes the dynamo_thread_init_during_process_exit flag that was added in 45dd931 for #2075, as the UNIX uninit_thread_count solution from #2600 solves that problem on its own. Fixes #2601

zhaoqin added Bug-Assert OpSys-ARM OpSys-Linux Priority-High labels Nov 25, 2016

egrimley changed the title ~~Flaky test pthread_exit on ARM~~ Flaky test pthreads_exit on ARM Mar 2, 2017

fhahn mentioned this issue May 5, 2017

Fix flaky tests on ARM #2416

Open

derekbruening closed this as completed Aug 19, 2017

derekbruening mentioned this issue Aug 19, 2017

i#2601: fix thread exit race on attach #2605

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test pthreads_exit on ARM #2075

Flaky test pthreads_exit on ARM #2075

zhaoqin commented Nov 25, 2016

zhaoqin commented Nov 25, 2016

zhaoqin commented Nov 25, 2016 •

edited

Loading

derekbruening commented Nov 26, 2016

egrimley commented Mar 1, 2017

egrimley commented Mar 2, 2017

egrimley commented Mar 10, 2017

derekbruening commented Mar 10, 2017

egrimley commented Mar 13, 2017

derekbruening commented Aug 19, 2017

egrimley commented Nov 2, 2017

Flaky test pthreads_exit on ARM #2075

Flaky test pthreads_exit on ARM #2075

Comments

zhaoqin commented Nov 25, 2016

zhaoqin commented Nov 25, 2016

zhaoqin commented Nov 25, 2016 • edited Loading

derekbruening commented Nov 26, 2016

egrimley commented Mar 1, 2017

egrimley commented Mar 2, 2017

egrimley commented Mar 10, 2017

derekbruening commented Mar 10, 2017

egrimley commented Mar 13, 2017

derekbruening commented Aug 19, 2017

egrimley commented Nov 2, 2017

zhaoqin commented Nov 25, 2016 •

edited

Loading