Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coarse-grain code writing to other code regions is very slow #658

Closed
derekbruening opened this issue Nov 28, 2014 · 4 comments
Closed

coarse-grain code writing to other code regions is very slow #658

derekbruening opened this issue Nov 28, 2014 · 4 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on January 27, 2012 12:15:54

I see this on chrome due to v8, in particular on shutdown. using -coarse_units has a 4x or worse slowdown on startup+shutdown. using -no_cache_consistency eliminates the perf difference vs fine-grain.

one downside of coarse-grain is that the translation from a cache pc to a fragment is slow: it needs to do a tree lookup and a coarse htable walk (I do use a hashtable for lookup of start pc which is needed for other things). we could put in a tree to replace the walk.

there are currently 3 pclookups for each codemod so certainly we should get rid of 2 of them:

  1. recreating app pc for context_to_mcontext
    00 1f27ea08 6eb18f98 dynamorio!fcache_fragment_pclookup [e:\src\dr\git\src\core\fcache.c @ 1119]
    01 1f27ea3c 6eb47dc5 dynamorio!fragment_pclookup_with_linkstubs+0x18 [e:\src\dr\git\src\core\fragment.c @ 2990]
    02 1f27ed14 6eb4830f dynamorio!recreate_app_state_internal+0x225 [e:\src\dr\git\src\core\x86\arch.c @ 3048]
    03 1f27ee74 6eb81b65 dynamorio!recreate_app_pc+0x3f [e:\src\dr\git\src\core\x86\arch.c @ 3215]
    04 1f27efe4 6eaf1362 dynamorio!found_modified_code+0x45 [e:\src\dr\git\src\core\win32\callback.c @ 3510]
    05 000000bb 00000000 dynamorio!call_modcode_alt_stack+0x35

  2. recreating full state
    00 1f27eb40 6eb18f98 dynamorio!fcache_fragment_pclookup [e:\src\dr\git\src\core\fcache.c @ 1119]
    01 1f27eb74 6eb47dc5 dynamorio!fragment_pclookup_with_linkstubs+0x18 [e:\src\dr\git\src\core\fragment.c @ 2990]
    02 1f27ee4c 6eb48348 dynamorio!recreate_app_state_internal+0x225 [e:\src\dr\git\src\core\x86\arch.c @ 3048]
    03 1f27ee60 6eb81b86 dynamorio!recreate_app_state+0x18 [e:\src\dr\git\src\core\x86\arch.c @ 3278]
    04 1f27efe4 6eaf1362 dynamorio!found_modified_code+0x66 [e:\src\dr\git\src\core\win32\callback.c @ 3511]
    05 000000bb 00000000 dynamorio!call_modcode_alt_stack+0x35

  3. handle_modified_code
    00 1f27edf8 6eb3bb89 dynamorio!fcache_fragment_pclookup [e:\src\dr\git\src\core\fcache.c @ 1119]
    01 1f27ee74 6eb81ced dynamorio!handle_modified_code+0x59 [e:\src\dr\git\src\core\vmareas.c @ 9885]
    02 1f27efe4 6eaf1362 dynamorio!found_modified_code+0x1cd [e:\src\dr\git\src\core\win32\callback.c @ 3660]
    03 000000bb 00000000 dynamorio!call_modcode_alt_stack+0x35

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=658

@derekbruening
Copy link
Contributor Author

From [email protected] on January 27, 2012 10:58:24

on installed-chrome on close.html it's a huge slowdown:

% for ((i=0; i<3; i++)); do /usr/bin/time ~/dr/git/exports/bin32/drrun.exe -quiet -no_probe_api -no_code_api -stderr_mask 12 -msgbox_mask 8 -os_aslr 1 -coarse_freeze_min_size 0 -- c:/Program\ Files\ (x86)/Google/Chrome/Application/chrome.exe file:///e:/src/dr/benchmarks/desktop/iexplore/close.html ; done
0.00user 0.00system 0:06.49elapsed 0%CPU (0avgtext+0avgdata 236544maxresident)k
0.00user 0.01system 0:06.40elapsed 0%CPU (0avgtext+0avgdata 236544maxresident)k
0.00user 0.01system 0:06.42elapsed 0%CPU (0avgtext+0avgdata 236544maxresident)k
-disable_traces:
0.00user 0.01system 0:04.88elapsed 0%CPU (0avgtext+0avgdata 237056maxresident)k
0.00user 0.00system 0:04.74elapsed 0%CPU (0avgtext+0avgdata 237056maxresident)k
0.00user 0.01system 0:04.78elapsed 0%CPU (0avgtext+0avgdata 237056maxresident)k
-coarse_units -disable_traces:
0.00user 0.00system 1:38.90elapsed 0%CPU (0avgtext+0avgdata 237056maxresident)k
0.03user 0.06system 1:38.63elapsed 0%CPU (0avgtext+0avgdata 237312maxresident)k
0.00user 0.00system 1:38.96elapsed 0%CPU (0avgtext+0avgdata 237568maxresident)k

shrinking the 3 pclookups to 1 brings it from 1:38 down to 0:39

next step: cache last few lookups, since usually there are only a handful of addresses doing the writing

@derekbruening
Copy link
Contributor Author

From [email protected] on January 27, 2012 13:59:05

I tried caching the last N as well as using a hashtable to store all prior lookups. both perform equally well so long as the last N is at least 12 for chrome V8. given that it needs to be relatively high I'm going w/ the hashtable to have worst-case be extra memory rather than extra time.

now we have:
0.00user 0.01system 0:06.39elapsed 0%CPU (0avgtext+0avgdata 237056maxresident)k
0.00user 0.01system 0:06.22elapsed 0%CPU (0avgtext+0avgdata 236800maxresident)k
0.00user 0.01system 0:06.05elapsed 0%CPU (0avgtext+0avgdata 236800maxresident)k
0.00user 0.00system 0:06.04elapsed 0%CPU (0avgtext+0avgdata 237056maxresident)k
0.00user 0.01system 0:05.91elapsed 0%CPU (0avgtext+0avgdata 236544maxresident)k

though w/o consistency we have:

-no_cache_consistency -coarse_units -disable_traces
0.00user 0.00system 0:04.70elapsed 0%CPU (0avgtext+0avgdata 236288maxresident)k
0.01user 0.00system 0:04.72elapsed 0%CPU (0avgtext+0avgdata 236544maxresident)k
0.00user 0.00system 0:04.69elapsed 0%CPU (0avgtext+0avgdata 236544maxresident)k

so there's still a slight hit, but much, much better than before

@derekbruening
Copy link
Contributor Author

From [email protected] on January 27, 2012 13:59:29

not that for V8, issue #242 should get rid of the remaining overhead

@derekbruening
Copy link
Contributor Author

From [email protected] on January 31, 2012 21:33:51

This issue was closed by revision r1233 .

Status: Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant