-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea: by 'producer' / by 'line of allocation' classifier? #11
Comments
Would be cool! If it is worth the effort and it works. I am slightly inclined to
|
File "/usr/lib/python3.6/sre_compile.py", line 416 Seems it doesn't use pointer equality but compares and hashes on the value in some way. Even if I do:
I get the same sre_compile.py traceback. |
Same thing here. If I reverse the order of:
I get:
That is so weird. Also, the printed traceback from tracemalloc should be a most-recent-call-last traceback. If I increase the trace size I get:
|
You reversed the printout, right?
|
Ah, Looks like Py 3.7 reversed it |
Looks like this can have to do with this. Presumably the list is not really allocated but a list in a free list is reused, and the traced malloc site is presumably where the list was first allocated. |
Ah I see. I'll see if I can build Python 3.8 and test that. |
Yep, that definitely is the fix:
Thanks for the pointer. I guess I'd better add a warning that this 'classifier' will be inaccurate for certain objects for Python < 3.8 :( |
The also-very-annoying thing is that you need to get the object's PyGC_Head if it's a GC type, and I'm guessing the most reliable way to get the size at runtime is via |
|
I think the extra overhead in 4a07064 is gonna make it really slow for Python 3.5. |
Well, it works!
Now I gotta figure out if the other methods should be defined, and write the warnings, and the docs and tests. By the way, it is seeing objects produced in Glue.py. I don't that should happen, right? |
Another not-yet-implemented feature that could probably be useful is to classify by the filename only. |
Oh, I messed up. I was testing that Python 3.5 fix in Python 3.8 by commenting out the fast path, and it produces the trace in the wrong order. Recompiled guppy and now it looks saner:
|
Cool! Interesting examples! |
That shouldn't happen but I have seen something like that before. I can see it after a number of _.more on the heap().byprod partition. I also found a number of objects that had been allocated at Classifiers.py:35 I got really many rows when I first used .byprod. How could it be 6339 producer sites? But it may be because of the priming with apport in View.py... Yes, removing that I get just 2651 producer sites on the heap. The objects that come from internal Heapy modules like Classifiers.py, Glue.py and also View.py deserves some more investigation, they shouldn't be included in heap(). |
Just occured to me, at least some or perhaps all of the suspicous producer sites may be because of the reusal tricks that Python <3.8 uses when it doesn't really allocate things again that we talked about later. It should be fixed in 3.8, right? I am still using 3.6.8 so that's may be the reason I see those producers. |
I checked their traces:
Yeah, that is likely. I'm doing my testing under self-compiled 3.8.0 from Git. Under 3.7.5:
I still don't get as much as 6339 sites though. Might be related to that apport thing on your side. |
Yes, I realized. |
I have managed to install Python3.9 but I can reproduce an error also with python2 and guppy-pe.
I see that in guppy3 the Py_Instance_Check was removed from hv_is_obj_hidden. I managed to fix it in guppy-pe and python2 by using _PyObject_GetDictPtr. This was done on the recursive variant, I have still to introduce it in the simulated recursive variant. And I didn't manage to fix it in guppy3 with the simulated recursion. I'll see tomorrow...
After the fix in hv.c the result was hp.Nothing |
In Python 3 everything is an instance of a type, there are no old style classes anymore. It is supposed to hide the instance's dict if it contains the hiding tag, but in my altered code it no longer hides the owner of the dict. I guess I should change that. |
Hmm. That might not be the case. |
In guppy3, we were missing When enabling caching again, I got rid of some of the spurious objects in heap() after some more changes in hv.c Testing your test in the mentioned commit in guppy-pe, I get just RootState even as I have caching enabled in Glue.py. But in guppy3 I get the strange root like in the example now that caching is enabled. I see that this is because of the change to root in View.py for priming. And all spurios objects are not away when paging down h.byprod with a number of _.more |
The caching in Glue.py in Share.getattr only occured if the name was not in the How about enabling caching in Glue.py again (perhaps only in Share.getattr) and adding |
Yeah, that should be exactly the cause.
I don't think I understand this. Which test is this? |
Ah, ok. I didn't realize this part of the logic. Thanks |
Formal documentation is difficult :( Anyways, I think this is done. Anything to be fixed before merge? |
I get unknown producer on iso() objects as well as the entire heap() |
tracemalloc enabled? I haven't tested this branch on 3.9 support yet but multi-interpreter completely hangs. |
497ef41 - My WIP patch on Py3.9 |
I had not enabled tracemalloc. Should we have a warning for that? |
6f35d5f#diff-8c727fb63492de2479412159060b42d6R371 This should have it. |
|
I realise I got a warning the first time actually, but didn't see it and there was only one warning. Maybe one could consider to have a warning each time we use byprod, not just the first, but I don't know |
I'd rather make an error than make a warning that emits every single time. Warnings get annoying fast :/ |
May consider an error then! |
Consider give an hint of how to enable tracemalloc. I didn't find a clue with python -h but had to look into how we did it the last time. |
I put a link to the docs:
|
Maybe even "Do you want to enable it? (y/n)" |
It would be too late to enable it by then. If I run |
I know, that's unfortunate but then you have the option to create new objects... |
Anyway, I forgot to say, Good Work! |
But then there would be more gotchas... I'd have to explain why, "Do you want to enable it? (y/n)" "y", one need to rerun everything to good information. Not to mention this partition being called from a repr for a MorePrinter so I'd somehow add interactive prompt to it... and then what if stdin isn't even interactive? Someone calling guppy from a long-running server process... argh I don't want to think about user interface. |
"Tracing is not enabled. Type hp.tm to enable" where hp is determined magically to be the hpy() object and tm is an attribute with a side effect to enable tracemalloc. Or if you prefer hp.tm() |
Hmm. Good idea. |
But maybe you want to explain that it doesn't apply to already allocated objects. |
Yeah, I still think the best idea would be to start tracemalloc as early as possible with How about let's just keep it an error? If people went as far as finding the producer classifier from the obscure docs / code of guppy, I'm sure they won't mind reading some highly-readable official Python docs. We could document on all those APIs and how-tos, and especially how in the world guppy even works and how to write extend guppy, with some more-user-friendly less-mathematical documentations, but I'm a pretty bad doc writer. All I'm good at is emitting code 😂 |
Yeah, that seems to be a good option. Another idea I was contemplating was to have an argument to hpy() that enabled tracemalloc. So we didn't have to read the Python docs. Or even a special constructor. But I don't know if it's useful enough and we would have to document it ourself... and I agree, I prefer coding before writing docs myself too, although I have to write docs at work. |
BTW x.byprod is missing from the formal documentation of IdentitySet |
Fixed |
Yeah, I just think that the fact that it would only work for objects that gets allocated after enabling really limits its usefulness. |
Alright merging. If anything should be improved please tell :) |
Quote the thesis:
But we no longer need special instrumentation now that Python 3 supports it natively:
We could probably somehow use tracemalloc instead of hook into python memory allocator to avoid reinventing the wheel. However, this has a non-negligible memory overhead so we probably don't want to enable it as default and show a message if no object would be classified this way.
Similarly, it would be nice if we could get the trace of a single object with some convenient attribute access on an identityset. Though, admittedly, reference graphs are far more useful.
How should this classifier be called? byprod? bytrace? byalloc?
The text was updated successfully, but these errors were encountered: