Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: by 'producer' / by 'line of allocation' classifier? #11

Closed
zhuyifei1999 opened this issue Oct 3, 2019 · 102 comments
Closed

Idea: by 'producer' / by 'line of allocation' classifier? #11

zhuyifei1999 opened this issue Oct 3, 2019 · 102 comments

Comments

@zhuyifei1999
Copy link
Owner

Quote the thesis:

A producer profile classifies cells by the program components that created them.

Of these, the producer profile would require special instrumentation of the Python interpreter to record the call stack when allocating objects, and is therefore outside the current scope of Heapy.

But we no longer need special instrumentation now that Python 3 supports it natively:

$ python -X tracemalloc=10
Python 3.7.4 (default, Aug 26 2019, 23:27:06) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tracemalloc
>>> for f in tracemalloc.get_object_traceback(tracemalloc.get_object_traceback).format(): print(f)
... 
  File "<stdin>", line 1
  File "<frozen importlib._bootstrap>", line 983
  File "<frozen importlib._bootstrap>", line 967
  File "<frozen importlib._bootstrap>", line 677
  File "<frozen importlib._bootstrap_external>", line 728
  File "<frozen importlib._bootstrap>", line 219
  File "/usr/lib/python3.7/tracemalloc.py", line 235
    def get_object_traceback(obj):

We could probably somehow use tracemalloc instead of hook into python memory allocator to avoid reinventing the wheel. However, this has a non-negligible memory overhead so we probably don't want to enable it as default and show a message if no object would be classified this way.

Similarly, it would be nice if we could get the trace of a single object with some convenient attribute access on an identityset. Though, admittedly, reference graphs are far more useful.

How should this classifier be called? byprod? bytrace? byalloc?

@svenil
Copy link

svenil commented Oct 3, 2019

Would be cool! If it is worth the effort and it works. I am slightly inclined to byprod.
But I don't see how it works. I tried to get the traceback on a simple list allocated at the top level but get strange results. I get no mention of the tracetest.py module in the traceback. Can you explain what I am doing wrong?

sverker@sverker-HP-Pavilion-g6-Notebook-PC:~/git/guppy-pe/bugs$ cat tracetest.py
import tracemalloc
x=[]
for f in tracemalloc.get_object_traceback(x).format(): print(f)
sverker@sverker-HP-Pavilion-g6-Notebook-PC:~/git/guppy-pe/bugs$ python3 -X tracemalloc=10
Python 3.6.8 (default, Jan 14 2019, 11:02:34) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tracetest
  File "/usr/lib/python3.6/sre_compile.py", line 416
    prefix = []
  File "/usr/lib/python3.6/sre_compile.py", line 498
    prefix, prefix_skip, got_all = _get_literal_prefix(pattern)
  File "/usr/lib/python3.6/sre_compile.py", line 548
    _compile_info(code, p, flags)
  File "/usr/lib/python3.6/sre_compile.py", line 566
    code = _code(p, flags)
  File "/usr/lib/python3.6/re.py", line 301
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.6/re.py", line 233
    return _compile(pattern, flags)
  File "/usr/lib/python3.6/tokenize.py", line 37
    cookie_re = re.compile(r'^[ \t\f]*#.*?coding[:=][ \t]*([-\w.]+)', re.ASCII)
  File "<frozen importlib._bootstrap>", line 219
  File "<frozen importlib._bootstrap_external>", line 678
  File "<frozen importlib._bootstrap>", line 665
>>> 

@svenil
Copy link

svenil commented Oct 3, 2019

File "/usr/lib/python3.6/sre_compile.py", line 416
prefix = []

Seems it doesn't use pointer equality but compares and hashes on the value in some way.
Arguably a bug? How does it work for you?

Even if I do:

x=['abc1234']

I get the same sre_compile.py traceback.
Something seems to be really wrong or I don't just not get it;-)

@zhuyifei1999
Copy link
Owner Author

Same thing here. If I reverse the order of:

import tracemalloc
x=[]

I get:

Traceback (most recent call last):
  File "tracetest.py", line 3, in <module>
    for f in tracemalloc.get_object_traceback(x).format(): print(f)
AttributeError: 'NoneType' object has no attribute 'format'

That is so weird.

Also, the printed traceback from tracemalloc should be a most-recent-call-last traceback. If I increase the trace size I get:

  File "tracetest.py", line 1
    import tracemalloc
  File "<frozen importlib._bootstrap>", line 983
  File "<frozen importlib._bootstrap>", line 967
  File "<frozen importlib._bootstrap>", line 677
  File "<frozen importlib._bootstrap_external>", line 728
  File "<frozen importlib._bootstrap>", line 219
  File "/usr/lib/python3.7/tracemalloc.py", line 4
    import linecache
  File "<frozen importlib._bootstrap>", line 983
  File "<frozen importlib._bootstrap>", line 967
  File "<frozen importlib._bootstrap>", line 677
  File "<frozen importlib._bootstrap_external>", line 728
  File "<frozen importlib._bootstrap>", line 219
  File "/usr/lib/python3.7/linecache.py", line 11
    import tokenize
  File "<frozen importlib._bootstrap>", line 983
  File "<frozen importlib._bootstrap>", line 967
  File "<frozen importlib._bootstrap>", line 677
  File "<frozen importlib._bootstrap_external>", line 728
  File "<frozen importlib._bootstrap>", line 219
  File "/usr/lib/python3.7/tokenize.py", line 37
    cookie_re = re.compile(r'^[ \t\f]*#.*?coding[:=][ \t]*([-\w.]+)', re.ASCII)
  File "/usr/lib/python3.7/re.py", line 234
    return _compile(pattern, flags)
  File "/usr/lib/python3.7/re.py", line 286
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.7/sre_compile.py", line 764
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.7/sre_parse.py", line 930
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.7/sre_parse.py", line 426
    not nested and not items))
  File "/usr/lib/python3.7/sre_parse.py", line 646
    item = subpattern[-1:]
  File "/usr/lib/python3.7/sre_parse.py", line 166
    return SubPattern(self.pattern, self.data[index])

@svenil
Copy link

svenil commented Oct 3, 2019

You reversed the printout, right?
I would think a depth of 1 would be enough to pinpoint the actual allocation site. Trying that, I get the sre_compile.py site even with a non-empty list allocation.

sverker@sverker-HP-Pavilion-g6-Notebook-PC:~/git/guppy-pe/bugs$ cat tracetest.py
import tracemalloc
x=['abc1234']
for f in tracemalloc.get_object_traceback(x).format(): print(f)
sverker@sverker-HP-Pavilion-g6-Notebook-PC:~/git/guppy-pe/bugs$ python3 -X tracemalloc=1
Python 3.6.8 (default, Jan 14 2019, 11:02:34) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tracetest
  File "/usr/lib/python3.6/sre_compile.py", line 416
    prefix = []
>>> 

@zhuyifei1999
Copy link
Owner Author

You reversed the printout, right?

Ah, Looks like Py 3.7 reversed it

@svenil
Copy link

svenil commented Oct 3, 2019

Looks like this can have to do with this. Presumably the list is not really allocated but a list in a free list is reused, and the traced malloc site is presumably where the list was first allocated.

https://bugs.python.org/issue35053

@zhuyifei1999
Copy link
Owner Author

Ah I see. I'll see if I can build Python 3.8 and test that.

@zhuyifei1999
Copy link
Owner Author

Yep, that definitely is the fix:

zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/cpython $ ./python --version
Python 3.8.0rc1
zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/cpython $ cat ~/guppy3/tracetest.py
import tracemalloc
x=[]
for f in tracemalloc.get_object_traceback(x).format(): print(f)
zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/cpython $ ./python -X tracemalloc=40 ~/guppy3/tracetest.py 
  File "/home/zhuyifei1999/guppy3/tracetest.py", line 2
    x=[]
zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/cpython $ cat > ~/guppy3/tracetest.py << 'EOF'
> x=[]
> import tracemalloc
> for f in tracemalloc.get_object_traceback(x).format(): print(f)
> EOF
zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/cpython $ cat ~/guppy3/tracetest.py
x=[]
import tracemalloc
for f in tracemalloc.get_object_traceback(x).format(): print(f)
zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/cpython $ ./python -X tracemalloc=40 ~/guppy3/tracetest.py 
  File "/home/zhuyifei1999/guppy3/tracetest.py", line 1
    x=[]

Thanks for the pointer.

I guess I'd better add a warning that this 'classifier' will be inaccurate for certain objects for Python < 3.8 :(

@zhuyifei1999
Copy link
Owner Author

zhuyifei1999 commented Nov 12, 2019

The also-very-annoying thing is that you need to get the object's PyGC_Head if it's a GC type, and sizeof(PyGC_Head) is not to be trusted even across Python minor releases, as seen in issue #1.

I'm guessing the most reliable way to get the size at runtime is via _testcapi.SIZEOF_PYGC_HEAD, which dates past the git history of CPython, and hopefully it is installed everywhere...

zhuyifei1999 added a commit that referenced this issue Nov 12, 2019
@zhuyifei1999
Copy link
Owner Author

(venv.py38) zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/guppy3 $ python -X tracemalloc=10 -c 'x = []; import guppy.heapy.heapyc; print(guppy.heapy.heapyc.HeapView(guppy.heapy.heapyc.RootState, ()).cli_prod({}).classify(x))'
('<string>', 1)

@zhuyifei1999
Copy link
Owner Author

I think the extra overhead in 4a07064 is gonna make it really slow for Python 3.5.

zhuyifei1999 added a commit that referenced this issue Nov 13, 2019
@zhuyifei1999
Copy link
Owner Author

zhuyifei1999 commented Nov 13, 2019

Well, it works!

(venv.py38) zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/guppy3 $ python -X tracemalloc=10 -ic 'hp = __import__("guppy").hpy()'
>>> hp.heap()
Partition of a set of 35171 objects. Total size = 4070238 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  10088  29   899768  22    899768  22 str
     1   6847  19   477048  12   1376816  34 tuple
     2   2419   7   427567  11   1804383  44 types.CodeType
     3    450   1   351664   9   2156047  53 type
     4   4839  14   343483   8   2499530  61 bytes
     5   2225   6   302600   7   2802130  69 function
     6    450   1   244968   6   3047098  75 dict of type
     7     95   0   172504   4   3219602  79 dict of module
     8    508   1   149800   4   3369402  83 dict (no owner)
     9   1101   3    79272   2   3448674  85 types.WrapperDescriptorType
<156 more rows. Type e.g. '_.more' to view.>
>>> _.byprod
Partition of a set of 35171 objects. Total size = 4070750 bytes.
 Index  Count   %     Size   % Cumulative  % Producer (line of allocation)
     0  11196  32  1185274  29   1185274  29 <frozen importlib._bootstrap>:671
     1   8740  25  1000458  25   2185732  54 None
     2   8604  24   953865  23   3139597  77 /home/zhuyifei1999/guppy3/guppy/etc/Glue.py:50
     3   2593   7   339106   8   3478703  85 <frozen importlib._bootstrap_external>:783
     4   1190   3   166996   4   3645699  90 /home/zhuyifei1999/guppy3/guppy/etc/Glue.py:209
     5     88   0    90739   2   3736438  92 <frozen importlib._bootstrap>:975
     6    744   2    87957   2   3824395  94 <frozen importlib._bootstrap>:991
     7    290   1    38386   1   3862781  95 <frozen importlib._bootstrap>:219
     8    157   0    23059   1   3885840  95 /home/zhuyifei1999/cpython/Lib/site.py:580
     9     98   0    13928   0   3899768  96 /home/zhuyifei1999/cpython/Lib/site.py:410
<166 more rows. Type e.g. '_.more' to view.>
>>> h = _
>>> h[0].kind
hp.Prod(('<frozen importlib._bootstrap>', 671))
>>> h & h[0].kind
Partition of a set of 11196 objects. Total size = 1185522 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   3694  33   356046  30    356046  30 str
     1   2692  24   194968  16    551014  46 tuple
     2    993   9   176015  15    727029  61 types.CodeType
     3   1928  17   158179  13    885208  75 bytes
     4     59   1    62432   5    947640  80 type
     5    102   1    51440   4    999080  84 dict of type
     6    327   3    44472   4   1043552  88 function
     7     54   0    23304   2   1066856  90 dict (no owner)
     8    264   2    19008   2   1085864  92 types.WrapperDescriptorType
     9     14   0    16000   1   1101864  93 dict of module
<27 more rows. Type e.g. '_.more' to view.>
>>> _.byprod
Partition of a set of 11196 objects. Total size = 1185522 bytes.
 Index  Count   %     Size   % Cumulative  % Producer (line of allocation)
     0  11196 100  1185522 100   1185522 100 <frozen importlib._bootstrap>:671
>>> h & hp.Prod(('<frozen importlib._bootstrap>', 671))
Partition of a set of 11196 objects. Total size = 1185522 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   3694  33   356046  30    356046  30 str
     1   2692  24   194968  16    551014  46 tuple
     2    993   9   176015  15    727029  61 types.CodeType
     3   1928  17   158179  13    885208  75 bytes
     4     59   1    62432   5    947640  80 type
     5    102   1    51440   4    999080  84 dict of type
     6    327   3    44472   4   1043552  88 function
     7     54   0    23304   2   1066856  90 dict (no owner)
     8    264   2    19008   2   1085864  92 types.WrapperDescriptorType
     9     14   0    16000   1   1101864  93 dict of module
<27 more rows. Type e.g. '_.more' to view.>
>>> h & hp.Prod(None)
Partition of a set of 8740 objects. Total size = 1000458 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   2568  29   219742  22    219742  22 str
     1    243   3   130104  13    349846  35 type
     2    182   2    99504  10    449350  45 dict of type
     3   1238  14    85752   9    535102  53 tuple
     4    780   9    56160   6    591262  59 types.WrapperDescriptorType
     5    779   9    55232   6    646494  65 bytes
     6    312   4    54952   5    701446  70 types.CodeType
     7     62   1    52168   5    753614  75 dict (no owner)
     8     20   0    43664   4    797278  80 dict of module
     9    309   4    42024   4    839302  84 function
<48 more rows. Type e.g. '_.more' to view.>
>>> 

Now I gotta figure out if the other methods should be defined, and write the warnings, and the docs and tests.

By the way, it is seeing objects produced in Glue.py. I don't that should happen, right?

@zhuyifei1999
Copy link
Owner Author

Another not-yet-implemented feature that could probably be useful is to classify by the filename only.

@zhuyifei1999
Copy link
Owner Author

zhuyifei1999 commented Nov 13, 2019

Oh, I messed up. I was testing that Python 3.5 fix in Python 3.8 by commenting out the fast path, and it produces the trace in the wrong order.

Recompiled guppy and now it looks saner:

(venv.py38) zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/guppy3 $ python -X tracemalloc=10 -ic 'hp = __import__("guppy").hpy()'
>>> hp.heap()
Partition of a set of 35164 objects. Total size = 4069658 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0  10086  29   899642  22    899642  22 str
     1   6846  19   476992  12   1376634  34 tuple
     2   2418   7   427391  11   1804025  44 types.CodeType
     3    450   1   351664   9   2155689  53 type
     4   4837  14   343397   8   2499086  61 bytes
     5   2224   6   302464   7   2801550  69 function
     6    450   1   244968   6   3046518  75 dict of type
     7     95   0   172504   4   3219022  79 dict of module
     8    508   1   149800   4   3368822  83 dict (no owner)
     9   1101   3    79272   2   3448094  85 types.WrapperDescriptorType
<156 more rows. Type e.g. '_.more' to view.>
>>> _.byprod
Partition of a set of 35164 objects. Total size = 4070099 bytes.
 Index  Count   %     Size   % Cumulative  % Producer (line of allocation)
     0  17038  48  1539746  38   1539746  38 <frozen importlib._bootstrap_external>:580
     1   8739  25  1000271  25   2540017  62 None
     2   1338   4   172631   4   2712648  67 <frozen importlib._bootstrap>:219
     3    116   0   114840   3   2827488  69 <frozen importlib._bootstrap>:36
     4    266   1    76640   2   2904128  71 /home/zhuyifei1999/cpython/Lib/abc.py:85
     5     77   0    19800   0   2923928  72
                                             /home/zhuyifei1999/cpython/Lib/collections/__init__.py:
                                             456
     6     10   0    17008   0   2940936  72 <frozen importlib._bootstrap_external>:1491
     7    225   1    15849   0   2956785  73 <frozen importlib._bootstrap_external>:1483
     8     26   0    15824   0   2972609  73 /home/zhuyifei1999/guppy3/guppy/etc/Glue.py:109
     9     26   0    14808   0   2987417  73 /home/zhuyifei1999/guppy3/guppy/etc/Glue.py:185
<2479 more rows. Type e.g. '_.more' to view.>
>>> h = _
>>> h[0].kind
hp.Prod(('<frozen importlib._bootstrap_external>', 580))
>>> h & h[0].kind
Partition of a set of 17038 objects. Total size = 1540230 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   6310  37   549312  36    549312  36 str
     1   2093  12   370692  24    920004  60 types.CodeType
     2   4525  27   330008  21   1250012  81 tuple
     3   4045  24   287502  19   1537514 100 bytes
     4     62   0     1748   0   1539262 100 int
     5      2   0      944   0   1540206 100 frozenset
     6      1   0       24   0   1540230 100 float
>>> _.byprod
Partition of a set of 17038 objects. Total size = 1540230 bytes.
 Index  Count   %     Size   % Cumulative  % Producer (line of allocation)
     0  17038 100  1540230 100   1540230 100 <frozen importlib._bootstrap_external>:580
>>> h & hp.Prod(('<frozen importlib._bootstrap_external>', 580))
Partition of a set of 17038 objects. Total size = 1540230 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   6310  37   549312  36    549312  36 str
     1   2093  12   370692  24    920004  60 types.CodeType
     2   4525  27   330008  21   1250012  81 tuple
     3   4045  24   287502  19   1537514 100 bytes
     4     62   0     1748   0   1539262 100 int
     5      2   0      944   0   1540206 100 frozenset
     6      1   0       24   0   1540230 100 float
>>> h & hp.Prod(None)
Partition of a set of 8739 objects. Total size = 1000271 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0   2567  29   219683  22    219683  22 str
     1    243   3   130104  13    349787  35 type
     2    182   2    99504  10    449291  45 dict of type
     3   1238  14    85752   9    535043  53 tuple
     4    780   9    56160   6    591203  59 types.WrapperDescriptorType
     5    779   9    55232   6    646435  65 bytes
     6    312   4    54952   5    701387  70 types.CodeType
     7     62   1    52040   5    753427  75 dict (no owner)
     8     20   0    43664   4    797091  80 dict of module
     9    309   4    42024   4    839115  84 function
<48 more rows. Type e.g. '_.more' to view.>
>>> 

@svenil
Copy link

svenil commented Nov 14, 2019

Cool! Interesting examples!

@svenil
Copy link

svenil commented Nov 14, 2019

By the way, it is seeing objects produced in Glue.py. I don't that should happen, right?

That shouldn't happen but I have seen something like that before. I can see it after a number of _.more on the heap().byprod partition. I also found a number of objects that had been allocated at Classifiers.py:35

I got really many rows when I first used .byprod. How could it be 6339 producer sites? But it may be because of the priming with apport in View.py... Yes, removing that I get just 2651 producer sites on the heap.

The objects that come from internal Heapy modules like Classifiers.py, Glue.py and also View.py deserves some more investigation, they shouldn't be included in heap().

@svenil
Copy link

svenil commented Nov 14, 2019

Just occured to me, at least some or perhaps all of the suspicous producer sites may be because of the reusal tricks that Python <3.8 uses when it doesn't really allocate things again that we talked about later. It should be fixed in 3.8, right? I am still using 3.6.8 so that's may be the reason I see those producers.

@zhuyifei1999
Copy link
Owner Author

That shouldn't happen but I have seen something like that before. I can see it after a number of _.more on the heap().byprod partition. I also found a number of objects that had been allocated at Classifiers.py:35

I checked their traces:

$ python -X tracemalloc=10 -ic 'hp = __import__("guppy").hpy()'
>>> h = hp.heap().byprod
>>> h
Partition of a set of 35167 objects. Total size = 4070756 bytes.
 Index  Count   %     Size   % Cumulative  % Producer (line of allocation)
     0  17038  48  1540193  38   1540193  38 <frozen importlib._bootstrap_external>:580
     1   8741  25  1000408  25   2540601  62 None
     2   1338   4   172631   4   2713232  67 <frozen importlib._bootstrap>:219
     3    116   0   114840   3   2828072  69 <frozen importlib._bootstrap>:36
     4    266   1    76640   2   2904712  71 /home/zhuyifei1999/cpython/Lib/abc.py:85
     5     77   0    19800   0   2924512  72
                                             /home/zhuyifei1999/cpython/Lib/collections/__init__.py:
                                             456
     6     10   0    17008   0   2941520  72 <frozen importlib._bootstrap_external>:1491
     7    225   1    15849   0   2957369  73 <frozen importlib._bootstrap_external>:1483
     8     26   0    15824   0   2973193  73 /home/zhuyifei1999/guppy3/guppy/etc/Glue.py:109
     9     26   0    14808   0   2988001  73 /home/zhuyifei1999/guppy3/guppy/etc/Glue.py:185
<2479 more rows. Type e.g. '_.more' to view.>
>>> h[8].byclodo
Partition of a set of 26 objects. Total size = 15824 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0     26 100    15824 100     15824 100 dict of guppy.etc.Glue.Share
>>> for i in __import__('tracemalloc').get_object_traceback(_.byid[0].theone).format(): print(i)
... 
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 852
    return self.c_str(a)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 1605
    ob = self.mod._parent.OutputHandling.output_buffer()
  File "/home/zhuyifei1999/guppy3/guppy/heapy/OutputHandling.py", line 295
    return OutputBuffer(self)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/OutputHandling.py", line 53
    self.strio = mod._root.io.StringIO()
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 50
    return self._share.getattr(self, name)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 209
    d = self.getattr2(inter, dct, owner, name)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 225
    x = self.getattr_package(inter, name)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 271
    x = self.makeModule(x, name)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 331
    return Share(module, self, module.__name__, Clamp)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 109
    self.module = module
>>> h[9].byclodo
Partition of a set of 26 objects. Total size = 14808 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0     26 100    14808 100     14808 100 dict (no owner)
>>> for i in __import__('tracemalloc').get_object_traceback(_.byid[0].theone).format(): print(i)
... 
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 227
    x = self.getattr3(inter, name)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 321
    x = f()
  File "/home/zhuyifei1999/guppy3/guppy/heapy/View.py", line 171
    hv = self.new_hv(_hiding_tag_=self._hiding_tag_,
  File "/home/zhuyifei1999/guppy3/guppy/heapy/View.py", line 409
    hv.register__hiding_tag__type(self._parent.UniSet.Kind)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 50
    return self._share.getattr(self, name)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 209
    d = self.getattr2(inter, dct, owner, name)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 225
    x = self.getattr_package(inter, name)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 271
    x = self.makeModule(x, name)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 331
    return Share(module, self, module.__name__, Clamp)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Glue.py", line 185
    self.data = {}

Just occured to me, at least some or perhaps all of the suspicous producer sites may be because of the reusal tricks that Python <3.8 uses when it doesn't really allocate things again that we talked about later. It should be fixed in 3.8, right? I am still using 3.6.8 so that's may be the reason I see those producers.

Yeah, that is likely. I'm doing my testing under self-compiled 3.8.0 from Git. Under 3.7.5:

(venv) zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/guppy3 $ python -X tracemalloc=10 -ic 'hp = __import__("guppy").hpy()'
>>> h = hp.heap().byprod
>>> h
Partition of a set of 37443 objects. Total size = 4401800 bytes.
 Index  Count   %     Size   % Cumulative  % Producer (line of allocation)
     0  13674  37  1242345  28   1242345  28 <frozen importlib._bootstrap_external>:525
     1   8460  23  1005737  23   2248082  51 None
     2   7150  19   753563  17   3001645  68 <frozen importlib._bootstrap>:219
     3    256   1    77392   2   3079037  70 /usr/lib/python-
                                             exec/python3.7/../../../lib/python3.7/abc.py:126
     4    107   0    64552   1   3143589  71 <frozen importlib._bootstrap_external>:606
     5    505   1    32952   1   3176541  72 <frozen importlib._bootstrap_external>:1408
     6     11   0    27128   1   3203669  73 <frozen importlib._bootstrap_external>:1416
     7     37   0    22784   1   3226453  73 <frozen importlib._bootstrap_external>:916
     8    165   0    19024   0   3245477  74 /usr/lib/python-
                                             exec/python3.7/../../../lib/python3.7/abc.py:127
     9     79   0    16066   0   3261543  74 /usr/lib/python3.7/collections/__init__.py:397
<2503 more rows. Type e.g. '_.more' to view.>

I still don't get as much as 6339 sites though. Might be related to that apport thing on your side.

@svenil
Copy link

svenil commented Nov 15, 2019

I still don't get as much as 6339 sites though. Might be related to that apport thing on your side.

Yes, I realized.
We don't get the same sites from Glue with Python < 3.8. Maybe it is masked by not being presented at the real producer sites. I'll see if I can try with 3.8 later.

@svenil
Copy link

svenil commented Nov 19, 2019

I have managed to install Python3.9 but I can reproduce an error also with python2 and guppy-pe.
It seems to have to do with two things:

  1. When calculating heap(), we traverse all objects but only afterwards clean up the result for the hidden objects with hv_cleanup_mutset(). So the objects in Glue.py that have no hiding tag are not cleaned up.
  2. Even after checking when traversing, via hv_is_obj_hidden, I was only checking the contents of the dict for hiding_tag if the object was an Py_Instance_Check(obj). Objects that inherited from object were not checked.

I see that in guppy3 the Py_Instance_Check was removed from hv_is_obj_hidden. I managed to fix it in guppy-pe and python2 by using _PyObject_GetDictPtr. This was done on the recursive variant, I have still to introduce it in the simulated recursive variant. And I didn't manage to fix it in guppy3 with the simulated recursion. I'll see tomorrow...
Before the fix I could reproduce it in guppy-pe (and also in guppy3) as follows. (Maybe even better example would be to test with Glue.Interface which inherits from object.)

>>> from guppy import hpy
>>> hp=hpy()
>>> from guppy.etc import Glue
>>> h=hp.heap()
>>> g=h&hp.Clodo(dictof=Glue.Share)
>>> g
Partition of a set of 23 objects. Total size = 11960 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0     23 100    11960 100     11960 100 dict of guppy.etc.Glue.Share

After the fix in hv.c the result was hp.Nothing

@zhuyifei1999
Copy link
Owner Author

I see that in guppy3 the Py_Instance_Check was removed from hv_is_obj_hidden.

In Python 3 everything is an instance of a type, there are no old style classes anymore. It is supposed to hide the instance's dict if it contains the hiding tag, but in my altered code it no longer hides the owner of the dict. I guess I should change that.

@zhuyifei1999
Copy link
Owner Author

It is supposed to hide the instance's dict if it contains the hiding tag

Hmm. That might not be the case.

@svenil
Copy link

svenil commented Nov 20, 2019

In guppy3, we were missing _hiding_tag_ in the dict of Interface because of this commit:
3c6c392

When enabling caching again, I got rid of some of the spurious objects in heap() after some more changes in hv.c

Testing your test in the mentioned commit in guppy-pe, I get just RootState even as I have caching enabled in Glue.py. But in guppy3 I get the strange root like in the example now that caching is enabled. I see that this is because of the change to root in View.py for priming.

And all spurios objects are not away when paging down h.byprod with a number of _.more
So it's still something wrong going on.

@svenil
Copy link

svenil commented Nov 20, 2019

The caching in Glue.py in Share.getattr only occured if the name was not in the _chgable_ tuple of the glueclamp. If the value is not changed, it shouldn't be a problem that the cache occurs in multiple Interface dicts. So when I add 'root' to the _chgable_ tuple in View.py, the root will not be cached and h.fam.Path.View.root is RootState correctly.

How about enabling caching in Glue.py again (perhaps only in Share.getattr) and adding 'root' to _chgable_ in View.py?

@zhuyifei1999
Copy link
Owner Author

I see that this is because of the change to root in View.py for priming.

Yeah, that should be exactly the cause.

And all spurios objects are not away when paging down h.byprod with a number of _.more
So it's still something wrong going on.

I don't think I understand this. Which test is this?

@zhuyifei1999
Copy link
Owner Author

zhuyifei1999 commented Nov 20, 2019

The caching in Glue.py only occurs if the name was not in the _chgable_ tuple of the glueclamp. If the value is not changed, it shouldn't be a problem that the cache occurs in multiple Interface dicts. So when I add 'root' to the _chgable_ tuple in View.py, the root will not be cached and h.fam.Path.View.root is rootstate correctly.

How about enabling caching in Glue.py again and adding 'root' to _chgable_ in View.py?

Ah, ok. I didn't realize this part of the logic. Thanks

zhuyifei1999 added a commit that referenced this issue Nov 20, 2019
zhuyifei1999 added a commit that referenced this issue Nov 20, 2019
@zhuyifei1999
Copy link
Owner Author

Formal documentation is difficult :(

Anyways, I think this is done. Anything to be fixed before merge?

@svenil
Copy link

svenil commented May 19, 2020

I get unknown producer on iso() objects as well as the entire heap()
Maybe it's something wrong in my installation but I am using Python 3.9 that I compiled myself

@zhuyifei1999
Copy link
Owner Author

tracemalloc enabled?

I haven't tested this branch on 3.9 support yet but multi-interpreter completely hangs.

@zhuyifei1999
Copy link
Owner Author

497ef41 - My WIP patch on Py3.9

@svenil
Copy link

svenil commented May 19, 2020

I had not enabled tracemalloc. Should we have a warning for that?

@zhuyifei1999
Copy link
Owner Author

6f35d5f#diff-8c727fb63492de2479412159060b42d6R371

This should have it.

@zhuyifei1999
Copy link
Owner Author

(venv) zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/guppy3 $ python -ic 'from guppy import hpy; hp=hpy(); h=hp.heap()'
>>> h.byprod
/home/zhuyifei1999/guppy3/guppy/heapy/Use.py:369: UserWarning: Python 3.7 and below tracemalloc may not record accurate producer trace. See https://bugs.python.org/issue35053
  "Python 3.7 and below tracemalloc may not record accurate "
/home/zhuyifei1999/guppy3/guppy/heapy/Use.py:373: UserWarning: Tracemalloc is not tracing. No producer profile available
  "Tracemalloc is not tracing. No producer profile available")
Partition of a set of 36573 objects. Total size = 4247366 bytes.
 Index  Count   %     Size   % Cumulative  % Producer (line of allocation)
     0  36573 100  4247366 100   4247366 100 unknown
>>> 
(venv) zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/guppy3 $ python -X tracemalloc -ic 'from guppy import hpy; hp=hpy(); h=hp.heap()'
>>> h.byprod
/home/zhuyifei1999/guppy3/guppy/heapy/Use.py:369: UserWarning: Python 3.7 and below tracemalloc may not record accurate producer trace. See https://bugs.python.org/issue35053
  "Python 3.7 and below tracemalloc may not record accurate "
Partition of a set of 36577 objects. Total size = 4247795 bytes.
 Index  Count   %     Size   % Cumulative  % Producer (line of allocation)
     0  19238  53  1691619  40   1691619  40 <frozen importlib._bootstrap_external>:525
     1   8446  23  1005233  24   2696852  63 unknown
     2   1251   3   178368   4   2875220  68 <frozen importlib._bootstrap>:219
     3    130   0    85496   2   2960716  70 <frozen importlib._bootstrap_external>:606
     4    260   1    79408   2   3040124  72 /usr/lib/python-
                                             exec/python3.7/../../../lib/python3.7/abc.py:126
     5    513   1    33462   1   3073586  72 <frozen importlib._bootstrap_external>:1408
     6     11   0    27128   1   3100714  73 <frozen importlib._bootstrap_external>:1416
     7     33   0    25696   1   3126410  74 <frozen importlib._bootstrap_external>:916
     8     98   0    23402   1   3149812  74 /usr/lib/python3.7/collections/__init__.py:397
     9    176   0    19816   0   3169628  75 /usr/lib/python-
                                             exec/python3.7/../../../lib/python3.7/abc.py:127
<2462 more rows. Type e.g. '_.more' to view.>

@svenil
Copy link

svenil commented May 19, 2020

I realise I got a warning the first time actually, but didn't see it and there was only one warning. Maybe one could consider to have a warning each time we use byprod, not just the first, but I don't know

@zhuyifei1999
Copy link
Owner Author

I'd rather make an error than make a warning that emits every single time. Warnings get annoying fast :/

@svenil
Copy link

svenil commented May 19, 2020

May consider an error then!

@svenil
Copy link

svenil commented May 19, 2020

Consider give an hint of how to enable tracemalloc. I didn't find a clue with python -h but had to look into how we did it the last time.

@zhuyifei1999
Copy link
Owner Author

I put a link to the docs:

(venv) zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/guppy3 $ python -ic 'from guppy import hpy; hp=hpy(); h=hp.heap()'
>>> h.byprod
/home/zhuyifei1999/guppy3/guppy/heapy/Use.py:122: UserWarning: Python 3.7 and below tracemalloc may not record accurate producer trace. See https://bugs.python.org/issue35053
  "Python 3.7 and below tracemalloc may not record accurate "
Traceback (most recent call last):
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 1703, in get_partition
    p = a._partition
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 74, in __getattr__
    return self.fam.mod.View.enter(lambda: self.fam.c_getattr(self, other))
  File "/home/zhuyifei1999/guppy3/guppy/heapy/View.py", line 256, in enter
    retval = func()
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 74, in <lambda>
    return self.fam.mod.View.enter(lambda: self.fam.c_getattr(self, other))
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 799, in c_getattr
    return self.c_getattr2(a, b)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 802, in c_getattr2
    raise AttributeError(b)
AttributeError: _partition

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 172, in __repr__
    return self.fam.c_repr(self)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 862, in c_repr
    return self.c_str(a)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 1615, in c_str
    return str(self.get_more(a).at(-1))
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 1693, in get_more
    return self.mod.OutputHandling.basic_more_printer(a, a.partition)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Descriptor.py", line 32, in __get__
    return super().__get__(instance, owner)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 526, in <lambda>
    partition = property_exp(lambda self: self.fam.get_partition(self), doc="""\
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 1706, in get_partition
    p = a.fam.Part.partition(a, a.er)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Part.py", line 783, in partition
    return SetPartition(self, set, er)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Part.py", line 692, in __init__
    for (kind, part) in classifier.partition(set.nodes)]
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Classifiers.py", line 105, in partition
    for k, v in self.partition_cli(iterable):
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Classifiers.py", line 114, in partition_cli
    self.cli.epartition)
  File "/home/zhuyifei1999/guppy3/guppy/etc/Descriptor.py", line 12, in __get__
    return self.fget(instance)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Classifiers.py", line 40, in _get_cli
    return self.get_cli()
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Classifiers.py", line 1188, in get_cli
    self.mod.Use._check_tracemalloc()
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Use.py", line 126, in _check_tracemalloc
    "Tracemalloc is not tracing. No producer profile available. "
RuntimeError: Tracemalloc is not tracing. No producer profile available. See https://docs.python.org/3/library/tracemalloc.html

@svenil
Copy link

svenil commented May 19, 2020

Maybe even "Do you want to enable it? (y/n)"

@zhuyifei1999
Copy link
Owner Author

It would be too late to enable it by then. If I run tracemalloc.start() then only the objects created after that would have their trace available.

@svenil
Copy link

svenil commented May 19, 2020

I know, that's unfortunate but then you have the option to create new objects...

@svenil
Copy link

svenil commented May 19, 2020

Anyway, I forgot to say, Good Work!

@zhuyifei1999
Copy link
Owner Author

zhuyifei1999 commented May 19, 2020

I know, that's unfortunate but then you have the option to create new objects...

But then there would be more gotchas... I'd have to explain why, "Do you want to enable it? (y/n)" "y", one need to rerun everything to good information.

Not to mention this partition being called from a repr for a MorePrinter so I'd somehow add interactive prompt to it... and then what if stdin isn't even interactive? Someone calling guppy from a long-running server process... argh I don't want to think about user interface.

@svenil
Copy link

svenil commented May 19, 2020

"Tracing is not enabled. Type hp.tm to enable"

where hp is determined magically to be the hpy() object and tm is an attribute with a side effect to enable tracemalloc. Or if you prefer hp.tm()

@zhuyifei1999
Copy link
Owner Author

Hmm. Good idea.

@svenil
Copy link

svenil commented May 19, 2020

But maybe you want to explain that it doesn't apply to already allocated objects.

@zhuyifei1999
Copy link
Owner Author

Yeah, I still think the best idea would be to start tracemalloc as early as possible with -X tracemalloc or PYTHONTRACEMALLOC=1. I can't think of how don't know how to make that clear, and explain the reasoning, in one error message.

How about let's just keep it an error? If people went as far as finding the producer classifier from the obscure docs / code of guppy, I'm sure they won't mind reading some highly-readable official Python docs.

We could document on all those APIs and how-tos, and especially how in the world guppy even works and how to write extend guppy, with some more-user-friendly less-mathematical documentations, but I'm a pretty bad doc writer. All I'm good at is emitting code 😂

@svenil
Copy link

svenil commented May 19, 2020

How about let's just keep it an error?

Yeah, that seems to be a good option.

Another idea I was contemplating was to have an argument to hpy() that enabled tracemalloc. So we didn't have to read the Python docs. Or even a special constructor. But I don't know if it's useful enough and we would have to document it ourself... and I agree, I prefer coding before writing docs myself too, although I have to write docs at work.

@svenil
Copy link

svenil commented May 19, 2020

BTW x.byprod is missing from the formal documentation of IdentitySet

zhuyifei1999 added a commit that referenced this issue May 19, 2020
@zhuyifei1999
Copy link
Owner Author

BTW x.byprod is missing from the formal documentation of IdentitySet

Fixed

@zhuyifei1999
Copy link
Owner Author

But I don't know if it's useful enough and we would have to document it ourself...

Yeah, I just think that the fact that it would only work for objects that gets allocated after enabling really limits its usefulness.

@zhuyifei1999
Copy link
Owner Author

Alright merging. If anything should be improved please tell :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants