Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASSERT missing memory region w/ start/stop API for changes after dr_app_setup #2037

Closed
derekbruening opened this issue Oct 19, 2016 · 3 comments

Comments

@derekbruening
Copy link
Contributor

# clients/bin64/tool.drcacheoff.burst_threads
pre-DR init
pre-DR start
pre-DR detach
<all_memory_areas is missing region 0x00007fb7c24a7000-0x00007fb7c2ca7000!>
<Application /work/dr/git/build_x64_dbg_tests/clients/bin64/tool.drcacheoff.burst_threads (23195).  Internal Error: DynamoRIO debug check failure: /work/dr/git/src/core/unix/memcache.c:446 false

The query is for the signal frame:

<all_memory_areas is missing region 0x00007ff1a2c62000-0x00007ff1a3462000!>

#1  0x00000000004fcd7c in internal_error (file=0x79e320 "/work/dr/git/src/core/unix/memcache.c", line=446, expr=0x79e79b "false")
    at /work/dr/git/src/core/utils.c:174
#2  0x00000000006eaca0 in memcache_query_memory (pc=0x7ff1a3460000 "", out_info=0x46d668d0) at /work/dr/git/src/core/unix/memcache.c:446
#3  0x00000000006ccf0b in query_memory_ex (pc=0x7ff1a3460000 "", out_info=0x46d668d0) at /work/dr/git/src/core/unix/os.c:8753
#4  0x00000000006ccf9a in get_memory_info (pc=0x7ff1a3460000 "", base_pc=0x0, size=0x0, prot=0x46d66924)
    at /work/dr/git/src/core/unix/os.c:8773
#5  0x00000000006d4f52 in copy_frame_to_stack (dcontext=0x46cd1c40, sig=12, frame=0x46d66ab8, sp=0x7ff1a3460878 "\375\005j", 
    from_pending=false) at /work/dr/git/src/core/unix/signal.c:2682
#6  0x00000000006e18b8 in sig_detach (dcontext=0x46cd1c40, frame=0x46d66ab8, detached=0x46d410f8)
    at /work/dr/git/src/core/unix/signal.c:6217
#7  0x00000000006e1ef9 in handle_suspend_signal (dcontext=0x46cd1c40, ucxt=0x46d66ac0, frame=0x46d66ab8)
    at /work/dr/git/src/core/unix/signal.c:6353

46d59000-46d5a000 r--p 00000000 00:00 0 
46d5a000-46d67000 rw-p 00000000 00:00 0                                  [stack:23277]
<...>
71412000-71436000 rw-p 00000000 00:00 0 
7ff1a1c5f000-7ff1a1c60000 ---p 00000000 00:00 0 
7ff1a1c60000-7ff1a2460000 rw-p 00000000 00:00 0                          [stack:23279]
7ff1a2460000-7ff1a2461000 ---p 00000000 00:00 0 
7ff1a2461000-7ff1a2c61000 rw-p 00000000 00:00 0 
7ff1a2c61000-7ff1a2c62000 ---p 00000000 00:00 0 
7ff1a2c62000-7ff1a3462000 rw-p 00000000 00:00 0 
7ff1a3462000-7ff1a3619000 r-xp 00000000 09:00 1313251                    /usr/lib64/libc-2.21.so

This is thread 23277. Why is it using 7ff1a2c62000-7ff1a3462000 as its
stack if the maps shows a different region as the stack? Sigaltstack? My
test doesn't set up an alt stack though.

(gdb) p *(thread_sig_info_t *)dcontext->signal_field
$4 = {
app_sigstack = {
ss_sp = 0x0,
ss_flags = 2,
ss_size = 0
},
sigstack = {
ss_sp = 0x46d59000,
ss_flags = 0,
ss_size = 57344
},
}

So the one labeled is in fact DR's alt stack and the missing region must be
the main app stack. How did our scan miss it? dr_app_setup() is prior to
thread creation is the reason I suppose and the start/stop API code never added any handling of later changes.

Should we do a full maps file scan on every dr_app_start?
Or we can try to lazily query maps file whenever our cache misses.
But the cache might have false positives, not just negatives, which will
mess us up.

Not sure the code supports repeated scans, might take a little work.
Need to throw out cache first, too.

@derekbruening
Copy link
Contributor Author

The solution of only supporting dr_app_setup_and_start() and dr_app_stop_and_cleanup() rather than having them split can still run into issues like this during attach if not-yet-attached threads are concurrently changing the address space.

@derekbruening
Copy link
Contributor Author

derekbruening commented Aug 9, 2017

Another assert like this is seen when running a test for #2601:

$ DYNAMORIO_OPTIONS="-msgbox_mask 12" LD_LIBRARY_PATH=lib64/debug suite/tests/bin/api.detach_spawn
<Starting application suite/tests/bin/api.detach_spawn (16269)>
<Initial options = -msgbox_mask 12 >
..............................................<thread exited while attaching>
<Application suite/tests/bin/api.detach_spawn (16269).  Internal Error: DynamoRIO debug check failure: core/unix/memcache.c:332 !exists || vmvector_overlap(all_memory_areas, start, end) || are_dynamo_vm_areas_stale()
(Error occurred @84 frags)

@derekbruening
Copy link
Contributor Author

Looks like we don't update allmem in memcache_query_memory() after a miss
and lookup in the maps file! So we have a stale allmem from setup being split from start,
and we repeatedly have to perform maps lookups: that explains the heavy contention on
the allmem lock from holding it across maps lookups.

derekbruening added a commit that referenced this issue Mar 6, 2018
Adds a complete maps file walk to update the memcache on re-taking-over the
process for dr_api_start.  The memcache is cleared beforehand to avoid both
false positives and negatives in later queries.  This helps to solve issues
with a gap between dr_app_setup() and dr_app_start().

Does not update the executable areas or module list: they are more
difficult to re-walk, and existing lazy updates to those will suffice for
now, with a low risk of false positives.

Adds updating of the memcache on a query miss.  Previously we would just
continue to miss and walk the maps file every time.

Tested manually by disabling the i#2114 change so that a signal does a
query, adding signals to the burst_threads test, and calling dr_app_setup()
before creating the test's threads, causing queries to miss when delivering
signals.  It is difficult to create a regression test for this as the
consequences are performance degradations rather than correctness, and
these degradations only really show up at scale with hundreds of threads
whose missing stacks are queried at once with no caching.

Fixes #2037
derekbruening added a commit that referenced this issue Mar 7, 2018
Adds a complete maps file walk to update the memcache on re-taking-over the
process for dr_api_start.  The memcache is cleared beforehand to avoid both
false positives and negatives in later queries.  This helps to solve issues
with a gap between dr_app_setup() and dr_app_start().

Does not update the executable areas or module list: they are more
difficult to re-walk, and existing lazy updates to those will suffice for
now, with a low risk of false positives.

Adds updating of the memcache on a query miss.  Previously we would just
continue to miss and walk the maps file every time.

Tested manually by disabling the i#2114 change so that a signal does a
query, adding signals to the burst_threads test, and calling dr_app_setup()
before creating the test's threads, causing queries to miss when delivering
signals.  It is difficult to create a regression test for this as the
consequences are performance degradations rather than correctness, and
these degradations only really show up at scale with hundreds of threads
whose missing stacks are queried at once with no caching.

Fixes #2037
derekbruening added a commit that referenced this issue Mar 19, 2018
Refactors find_executable_vm_areas() to share its map entry skipping with
the re-takeover re-walk from b06a702, but not its module list or
executable area updates.  This entry skipping for vmheap turns out to make
a big performance difference when attaching.

Removes individual updates to memcache for entries inside vmheap which were
already bulk-added for the find_executable_vm_areas() walk.

Issue: #2037
derekbruening added a commit that referenced this issue Mar 20, 2018
Refactors find_executable_vm_areas() to share its map entry skipping with
the re-takeover re-walk from b06a702, but not its module list or
executable area updates.  This entry skipping for vmheap turns out to make
a big performance difference when attaching.

Removes individual updates to memcache for entries inside vmheap which were
already bulk-added for the find_executable_vm_areas() walk.

Issue: #2037
derekbruening added a commit that referenced this issue Mar 24, 2018
Fixes Mac build breakage from 8471d5c.

Issue: #2037
derekbruening added a commit that referenced this issue Mar 24, 2018
Fixes Mac build breakage from 8471d5c.

Issue: #2037
derekbruening added a commit that referenced this issue Apr 16, 2018
Fixes a regression from #2037 8471d5c where the module re-walk hits a
deadlock lock order violation if logging is enabled.

Fixes #2934
derekbruening added a commit that referenced this issue Apr 16, 2018
Fixes a regression from #2037 8471d5c where the module re-walk hits a
deadlock lock order violation if logging is enabled.

Fixes #2934
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant