Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dubious assignment to be a check in hv_cli_and.c #14

Closed
svenil opened this issue Nov 9, 2019 · 12 comments
Closed

Dubious assignment to be a check in hv_cli_and.c #14

svenil opened this issue Nov 9, 2019 · 12 comments

Comments

@svenil
Copy link

svenil commented Nov 9, 2019

This assignment that will be a check is dubious. It is the same for the two cases. Suggest to fix and also have a test case that catches it, if possible. Does the code in nodetuple_richcompare work as intended? I don't understand it after these years.

hv_cli_and.c, lines 235-236

       vi = (Py_ssize_t)vt->ob_item[i];
       **wi = (Py_ssize_t)vt->ob_item[i];**
@svenil svenil changed the title Dubious check in hv_cli_and.c Dubious assignment to be a check in hv_cli_and.c Nov 9, 2019
@zhuyifei1999
Copy link
Owner

Is it supposed to be wi = (long)wt->ob_item[i];? Hmm... I'll check. (Got really busy with other stuffs recently)

@zhuyifei1999
Copy link
Owner

How to get a reference to this class:

$ python
Python 3.7.5 (default, Oct 23 2019, 18:04:59) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import guppy.heapy.heapyc
>>> hv = guppy.heapy.heapyc.HeapView(guppy.heapy.heapyc.RootState, ())
>>> d = {}
>>> cli = hv.cli_and((), d)
>>> cli.classify(())
()
>>> type(next(iter(d.keys())))
<class 'guppy.heapy.heapyc.NodeTuple'>
>>> 

Woah, that is more complicated than I had expected.

@zhuyifei1999
Copy link
Owner

Yeah, it's definitely wrong.

$ python
Python 3.7.5 (default, Oct 23 2019, 18:04:59) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import guppy.heapy.heapyc
>>> hv = guppy.heapy.heapyc.HeapView(guppy.heapy.heapyc.RootState, ())
>>> f = lambda l: hv.cli_and(tuple(hv.cli_user_defined(hv.cli_none(), None, lambda obj: i, None) for i in l), {}).classify(None)
>>> a = f((1,)); b = f((2,)); a, b, type(a), type(b), a == b
((1,), (2,), <class 'guppy.heapy.heapyc.NodeTuple'>, <class 'guppy.heapy.heapyc.NodeTuple'>, True)
>>> a = f((1,)); b = f((2,)); a, b, type(a), type(b), a == b, hash(a) == hash(b)
((1,), (2,), <class 'guppy.heapy.heapyc.NodeTuple'>, <class 'guppy.heapy.heapyc.NodeTuple'>, True, False)

@zhuyifei1999
Copy link
Owner

This guppy.heapy.heapyc.NodeTuple seems really hard to access from normal Python space. The result from the classifier seems always abstracted away by guppy.heapy.UniSet.Kind. Any idea how to test this?

zhuyifei1999 added a commit that referenced this issue Nov 9, 2019
@svenil
Copy link
Author

svenil commented Nov 11, 2019

I could test it by accessing the low level classifier. It is available at hpy().View.hv.cli_and

sverker@sverker-HP-Pavilion-g6-Notebook-PC:~/git/guppy-pe/bugs$ cat test_hv_cli_and.py
from guppy import hpy
h=hpy()
a=h.View.hv.cli_and((h.View.hv.cli_id(),), {})
assert a.classify(1)!=a.classify(2)
# Before fix
python
Python 2.7.15+ (default, Oct  7 2019, 17:39:04) 
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test_hv_cli_and
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "test_hv_cli_and.py", line 4, in <module>
    assert a.classify(1)!=a.classify(2)
AssertionError
# After fix
python
Python 2.7.15+ (default, Oct  7 2019, 17:39:04) 
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test_hv_cli_and
>>> 

@svenil
Copy link
Author

svenil commented Nov 11, 2019

I can see you accessed the low level classifier too in a previous comment where you used the user defined classifier too, I didn't understand it or read it well enough right away.

Mine was an alternative that was arguably easier to do.
Was your question how we can test it without accessing the low level classifier?

@zhuyifei1999
Copy link
Owner

zhuyifei1999 commented Nov 11, 2019

Was your question how we can test it without accessing the low level classifier?

Yeah. Like, does the abstractions above invoke nodetuple_richcompare?

I mean, if not, we could just test the low-level, or, we could test both high-level and low-level.

@svenil
Copy link
Author

svenil commented Nov 12, 2019

I find that it is called but I can not produce an error. It is used when partitioning a large enough set with the and classifier. But gdb gives strange results. The richcompare is called with the same results each time, presumably from the dict lookup when there is hash collisions, but why is it getting the same arguments each time? Is it something wrong with gdb interaction?

cat test3.py
from guppy import hpy
h=hpy()
al=[[] for x in range(100000)]
a0=al[:10000]

print h.idset(al).by(h.Id & h.Size) & h.idset(a0[:100]).by(h.Id & h.Size).kind

gdb python
GNU gdb (Ubuntu 8.1-0ubuntu3.1) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...(no debugging symbols found)...done.
(gdb) br nodetuple_richcompare
Function "nodetuple_richcompare" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (nodetuple_richcompare) pending.
(gdb) run
Starting program: /usr/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Python 2.7.15+ (default, Oct  7 2019, 17:39:04) 
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test3
warning: Error reading shared library list entry at 0xffffb420
warning: Error reading shared library list entry at 0xffffbe60

Breakpoint 1, nodetuple_richcompare (v=0xb748c36c, w=0xb748c38c, op=2)
    at src/heapy/hv_cli_and.c:202
202	{
(gdb) c
Continuing.

Breakpoint 1, nodetuple_richcompare (v=0xb748c36c, w=0xb748c38c, op=2)
    at src/heapy/hv_cli_and.c:202
202	{
(gdb) c
Continuing.

Breakpoint 1, nodetuple_richcompare (v=0xb748c36c, w=0xb748c38c, op=2)
    at src/heapy/hv_cli_and.c:202
202	{
(gdb) c
Continuing.

Breakpoint 1, nodetuple_richcompare (v=0xb748c36c, w=0xb748c38c, op=2)
    at src/heapy/hv_cli_and.c:202
202	{
(gdb) dis 1
(gdb) c
Continuing.
Partition of a set of 100 objects. Total size = 3200 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0    100 100     3200 100      3200 100 list
>>> 

@zhuyifei1999
Copy link
Owner

Woah

(venv) zhuyifei1999@zhuyifei1999-ThinkPad-T480 ~/guppy3 $ gdb -ex 'break nodetuple_richcompare' -ex r --args python test3.py
GNU gdb (Gentoo 8.3.1 vanilla) 8.3.1
[...]

Breakpoint 1, nodetuple_richcompare (v=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>), w=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>), op=2) at /home/zhuyifei1999/guppy3/src/heapy/hv_cli_and.c:205
205	    Py_ssize_t vi=0, wi=0;
(gdb) bt
#0  nodetuple_richcompare (v=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>), w=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>), op=2) at /home/zhuyifei1999/guppy3/src/heapy/hv_cli_and.c:205
#1  0x00007ffff7c1c9f2 in do_richcompare (op=2, w=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>), v=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>)) at /usr/src/debug/dev-lang/python-3.7.5-r1/Python-3.7.5/Objects/object.c:695
#2  PyObject_RichCompare (v=v@entry=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>), w=w@entry=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>), op=op@entry=2) at /usr/src/debug/dev-lang/python-3.7.5-r1/Python-3.7.5/Objects/object.c:743
#3  0x00007ffff7a8610c in PyObject_RichCompareBool (op=2, w=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>), v=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>)) at /usr/src/debug/dev-lang/python-3.7.5-r1/Python-3.7.5/Objects/object.c:751
#4  lookdict (mp=0x7ffff5ddf910, key=<optimized out>, hash=<optimized out>, value_addr=0x7fffffffba90) at /usr/src/debug/dev-lang/python-3.7.5-r1/Python-3.7.5/Objects/dictobject.c:757
#5  0x00007ffff7c30bc1 in PyDict_GetItem (op={(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>): (...)}, key=<optimized out>) at /usr/src/debug/dev-lang/python-3.7.5-r1/Python-3.7.5/Objects/dictobject.c:1327
#6  0x00007ffff69faab7 in hv_cli_and_fast_memoized_kind (self=0x7ffff5dde7d0, kind=(<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>)) at /home/zhuyifei1999/guppy3/src/heapy/hv_cli_and.c:42
#7  0x00007ffff69fae83 in hv_cli_and_classify (self=0x7ffff5dde7d0, obj=[]) at /home/zhuyifei1999/guppy3/src/heapy/hv_cli_and.c:117
#8  0x00007ffff6a01320 in cli_epartition_iter (obj=[], ta=0x7fffffffbce0) at /home/zhuyifei1999/guppy3/src/heapy/classifier.c:146
#9  0x00007ffff6a296ba in NyNodeSet_iterate (ns=0x555555580100, visit=0x7ffff6a012ea <cli_epartition_iter>, arg=0x7fffffffbce0) at /home/zhuyifei1999/guppy3/src/sets/nodeset.c:508
#10 0x00007ffff69f8b09 in NyNodeSet_iterate (ns=0x555555580100, visit=0x7ffff6a012ea <cli_epartition_iter>, arg=0x7fffffffbce0) at /home/zhuyifei1999/guppy3/src/heapy/impsets.c:44
#11 0x00007ffff69f8c3b in iterable_iterate (v=<guppy.sets.setsc.ImmNodeSet at remote 0x555555580100>, visit=0x7ffff6a012ea <cli_epartition_iter>, arg=0x7fffffffbce0) at /home/zhuyifei1999/guppy3/src/heapy/heapyc.c:93
#12 0x00007ffff6a01429 in cli_epartition (self=0x7ffff5dc6b90, iterable=<guppy.sets.setsc.ImmNodeSet at remote 0x555555580100>) at /home/zhuyifei1999/guppy3/src/heapy/classifier.c:166
[...]
(gdb) py-bt
Traceback (most recent call first):
  <built-in method epartition of guppy.heapy.heapyc.ObjectClassifier object at remote 0x7ffff5dc6b90>
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Classifiers.py", line 35, in call_with_referrers
    return f(x)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Classifiers.py", line 114, in partition_cli
    self.cli.epartition)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Classifiers.py", line 105, in partition
    for k, v in self.partition_cli(iterable):
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Part.py", line 709, in __init__
    for (kind, part) in classifier.partition(set.nodes)]
  File "/home/zhuyifei1999/guppy3/guppy/heapy/Part.py", line 800, in partition
    return SetPartition(self, set, er)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 1692, in get_partition
    p = a.fam.Part.partition(a, a.er)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 1603, in c_str
    a.fam.get_partition(a).ppob(ob)
  File "/home/zhuyifei1999/guppy3/guppy/heapy/UniSet.py", line 344, in __str__
    return self.fam.c_str(self)
  <built-in method print of module object at remote 0x7ffff6c6fd10>
  File "test3.py", line 6, in <module>
    print(h.idset(al).by(h.Id & h.Size) & h.idset(a0[:100]).by(h.Id & h.Size).kind)

Ok, a broken richcompare that is actively used but doesn't break the entire system... is interesting. I'm looking into it.

@zhuyifei1999
Copy link
Owner

Ah, I see. The hash "collision" is because the tuple values are the same, and our hash function has been good enough to avoid a collision when the tuple values are not the same.

The bug was a false positive during a richcompare when the tuple are not the same, but in this case, richcompare would not be called if the hash values are different. There wouldn't be a way to trigger the bug unless we can generate a hash collision for different tuple values.

Why is it getting the same arguments each time... probably because we aren't leaking memory and the tuple is recreated and destroyed again and again in hv_cli_and_classify.

Btw, the object that is being classified is [] and the classify result of the and classifier is (<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>). What sort of classifier is that? Isn't it supposed to be ([], 72) for an id & size classifier?

@zhuyifei1999
Copy link
Owner

Btw, the object that is being classified is [] and the classify result of the and classifier is (<type at remote 0x7ffff7d5e960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dce290>). What sort of classifier is that? Isn't it supposed to be ([], 72) for an id & size classifier?

Answer to self: Clodo, this is classifying during print so the set's printing classifer is still clodo.

(gdb) up
#1  0x00007ffff7c189f2 in do_richcompare (op=2, w=(<type at remote 0x7ffff7d5a960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dc5290>), v=(<type at remote 0x7ffff7d5a960>, <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff5dc5290>)) at /usr/src/debug/dev-lang/python-3.7.5-r1/Python-3.7.5/Objects/object.c:695
695	        res = (*f)(v, w, op);
[...]
(gdb) 
#7  0x00007ffff69f1e83 in hv_cli_and_classify (self=0x7ffff5dd5640, obj=[]) at /home/zhuyifei1999/guppy3/src/heapy/hv_cli_and.c:117
117	    result = hv_cli_and_fast_memoized_kind(self, kind);
(gdb) p ((NyObjectClassifierObject *)((PyTupleObject *)classifiers)->ob_item[0])->def->classify
$1 = (PyObject *(*)(PyObject *, PyObject *)) 0x7ffff69f513b <hv_cli_type_classify>
(gdb) p ((NyObjectClassifierObject *)((PyTupleObject *)classifiers)->ob_item[1])->def->classify
$2 = (PyObject *(*)(PyObject *, PyObject *)) 0x7ffff69f2a28 <hv_cli_dictof_classify>

@zhuyifei1999
Copy link
Owner

class AndClassifier(Classifier):
[...]
    def get_kind(self, k):
        ks = []
        for ki, ci in zip(k, self.args):
            ks.append(ci.get_kind(ki))
        return self.mod.UniSet.fam_And._cons(ks)
class Classifier:
[...]
    def partition(self, iterable):
        items = []
        for k, v in self.partition_cli(iterable):
            k = self.get_kind(k)
            v = self.mod.Use.idset(v, er=self.er)
            items.append((k, v))
        return items

    def partition_cli(self, a):
        ep = self.call_with_referrers(
            a,
            self.cli.epartition)
        return [(k, ep[k]) for k in ep.get_domain()]

Yeah, I see the NodeTuple is iterated through and used to create a list, so I don't think at the high level there is any use of NodeTuple, after the classification. During classification the NodeTuple is only used as keys to the dict, which would be somewhat hard to make a hash collision to hit this bug.

I guess a test at the low level is the way to go then. (Will do later)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants