-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OOM on DWARF's DIE traversal. #576
Comments
Sorry, there isn't. pyelftools caches aggressively. We have some vague plans for implementing a less memory hungry mode, but nothing written down. |
thank you for clear respone, appreciate 👍 |
We are revisiting this. What is the source of your binary, please, and what exactly are you doing with it? |
i'm working on a proprietary library that i cannot share unfortunately, but i can tell that we compile it using GCC14. the code that i execute looks as follows: for i, cu in enumerate(elf.get_dwarf_info().iter_CUs()):
die = cu.get_top_DIE()
recursive_dump(die)
meanwhile i upgraded my PC to 64GB RAM, and i can tell that the peak usage of my Python script is 59 GB. in order to provide some statistics, i added following code just after the loop mentioned above, in order to print import gc
import sys
print(f"sys.getsizeof(DIE) = {sys.getsizeof(sample_die := cu.get_top_DIE())}", file=sys.stderr)
print(f"sys.getsizeof(AttributeValue) = {sys.getsizeof(next(iter(sample_die.attributes)))}", file=sys.stderr)
print(f"total objects alive before last GC: {len(gc.get_objects())}", file=sys.stderr)
print(f"total number of alive DIE objects: {len([x for x in gc.get_objects() if isinstance(x, DIE)])}", file=sys.stderr)
print(f"total number of alive AttributeValue objects: {len([x for x in gc.get_objects() if isinstance(x, AttributeValue)])}", file=sys.stderr)
from collections import defaultdict
a = defaultdict(lambda: 0)
for x in gc.get_objects():
a[type(x)] += 1
from operator import itemgetter
print(sorted([(v, k) for k, v in a.items()][:10], key=itemgetter(0), reverse=True), file=sys.stderr) and here is the output:
so it seems that 99% of all allocations are due to i'm not memory profiling expert so i might have confused something; in that case let me know, and i will provide more useful data. |
for completeness i slightly modified the code above to show total bytes allocated, not number of objects (again, grouped by object type): a = defaultdict(lambda: 0)
for x in gc.get_objects():
a[type(x)] += sys.getsizeof(x) here is the result:
my conclusions (to be double-checked):
|
OBTW, did you find a workaround for your case? While there is no official low memory mode in the public API, it's possible to reduce memory consumption if you mess with pyelftools' internals. |
nope, since i upgraded my PC with more RAM and i no longer have OOM, i stopped investigating the issue (only spent some time to provide you more data) |
On my
32GB RAM + 8GB SWAP
I get OOM somewhere in the middle ofiter_CUs()
anddie.iter_children()
of an ELF with 2GB DWARF section.I haven't investigated the issue thoroughly, but it looks like the
pyelftools
keeps references to all the DIEs ever fetched from disk? Insertinggc.collect()
after each CU handling does not help.Is there an easy workaround for that?
The text was updated successfully, but these errors were encountered: