Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How does theone return the Python object? #31

Closed
michael-hahn opened this issue Jun 23, 2021 · 8 comments
Closed

[Question] How does theone return the Python object? #31

michael-hahn opened this issue Jun 23, 2021 · 8 comments

Comments

@michael-hahn
Copy link

I am trying to understand the inner workings of guppy. I am curious how the following method in IdentitySetSingleton actually works to return the actual Python object:

    def _get_theone(self):
        return self._node

Looking through the C extension code in hv.c, I see in hv_heap() function that it calls hv_heap_rec() which in turn calls NyNodeSet_setobj(). In NyNodeSet_setobj(), Py_INCREF() is called to increase the reference count of each obj. But in the code, I don't see where obj is stored in a node?

Thank you for your explanation.

@michael-hahn michael-hahn changed the title How does theone return the Python object? [Question] How does theone return the Python object? Jun 23, 2021
@zhuyifei1999
Copy link
Owner

Hmm... so the things such as NyNodeSet is just, think of it like a python set, except instead of equality being governed by the object's defined __eq__, it's equality based on the objects identity (a.k.a. memory address). Almost everything else from the API standpoint should be similar to that of a python set. You can see the test cases: https://github.com/zhuyifei1999/guppy3/blob/master/guppy/sets/test.py#L1327

Because nodeset contains objects, it has to keep the objects alive. This is where Py_INCREF you mentioned comes in.

Things like IdentitySet is more of a Python wrapper around the C NyNodeSet that provides the lots of methods and printers and stuffs to better interact with and visualize the sets.

Since you're looking at hv_heap, I assume you want to see all the way from how nodesets gets created, so let's assume somehow the heap contains only one object, so you did hpy().heap().theone:

Use.heap():

guppy3/guppy/heapy/Use.py

Lines 175 to 195 in a1bd557

def heap(self):
"""heap() -> IdentitySet[1]
Traverse the heap from a root to find all reachable and visible
objects. The objects that belong to a heapy instance are normally not
included. Return an IdentitySet with the objects found, which is
presented as a table partitioned according to a default equivalence
relation (Clodo [3]).
See also: setref[2]
References
[0] heapy_Use.html#heapykinds.Use.heap
[1] heapy_UniSet.html#heapykinds.IdentitySet
[2] heapy_Use.html#heapykinds.Use.setref
[3] heapy_Use.html#heapykinds.Use.Clodo"""
h = self.View.heap()
h |= self.gcobjs
h -= self.relheap
return h

View.heap():

guppy3/guppy/heapy/View.py

Lines 340 to 367 in a1bd557

def heap(self):
"""V.heap() -> idset
Return the set of objects in the visible heap.
"""
global heap_one_time_initialized
# This is to make sure that the first time called
# the heap will contain things that may likely be loaded later
# because of common operations.
if not heap_one_time_initialized:
old_root = self.root
objs = [[], 'a', 1, 1.23, {'a': 'b'}, self]
self.root = objs
try:
repr(self.idset(objs))
repr(self.iso(objs[0]).shpaths)
repr(self.iso(objs[0]).rp)
finally:
self.root = old_root
del objs
del old_root
heap_one_time_initialized = True
self.gc.collect() # Sealing a leak at particular usage ; Notes Apr 13 2005
# Exclude current frame by encapsulting in enter(). Note Apr 20 2005
return self.enter(lambda:
self.idset(self.hv.heap()))

View.enter is for limiting the visibility of guppy-internal frames during reference traversal, so the important part is self.idset(self.hv.heap())

hv.heap() is your mentioned hv_heap:

guppy3/src/heapy/hv.c

Lines 889 to 922 in a1bd557

static PyObject *
hv_heap(NyHeapViewObject *self, PyObject *args, PyObject *kwds)
{
HeapTravArg ta;
ta.hv = self;
ta.visited = hv_mutnodeset_new(self);
ta.to_visit = PyList_New(0);
if (!(ta.visited && ta.to_visit))
goto err;
if (hv_heap_rec(ta.hv->root, &ta) == -1)
goto err;
while (PyList_Size(ta.to_visit)) {
PyObject *obj = hv_PyList_Pop(ta.to_visit);
if (!obj)
goto err;
if (hv_std_traverse(ta.hv, obj, (visitproc)hv_heap_rec, &ta) == -1) {
Py_DECREF(obj);
goto err;
}
Py_DECREF(obj);
}
if (hv_cleanup_mutset(ta.hv, ta.visited) == -1)
goto err;
if (PyObject_Length(self->static_types) == 0) {
if (hv_update_static_types(self, (PyObject *)ta.visited) == -1)
goto err;
}
Py_XDECREF(ta.to_visit);
return (PyObject *)ta.visited;
err:
Py_XDECREF(ta.visited);
Py_XDECREF(ta.to_visit);
return 0;
}

The important thing here is that it creates and returns a mutnodeset (a mutable nodeset) that contains everything it saw during the traversal.

Then View.idset() is an import / alias:

'_parent.Use:idset',

Use.idset():

'_parent.UniSet:idset',

UniSet.idset():

guppy3/guppy/heapy/UniSet.py

Lines 2214 to 2215 in a1bd557

def idset(self, iterable, er=None):
return self.fam_IdentitySet._cons(self.immnodeset(iterable), er=er)

Here immnodeset first creates a copy of the mutnodeset, but now it's immutable, then it does IdentitySetFamily._cons():

guppy3/guppy/heapy/UniSet.py

Lines 1526 to 1538 in a1bd557

def _cons(self, arg, er=None):
# arg is a sequence of nodes
arg = self.immnodeset(arg)
if not arg:
return self.mod.Nothing
elif len(arg) == 1:
r = IdentitySetSingleton(self, tuple(arg)[0])
else:
r = IdentitySetMulti(self, arg)
if er is not None:
r._er = er
return r

If the size of the set is 1, then it gets the single element of the set via tuple(arg)[0], and calls the constructor of IdentitySetSingleton with it

And IdentitySetSingleton's constructor sets self._node with the given element:

class IdentitySetSingleton(IdentitySet):
__slots__ = '_node',
_help_url_ = 'heapy_UniSet.html#heapykinds.IdentitySetSingleton'
def __init__(self, fam, node):
super().__init__(fam)
self._node = node

So tl;dr: in hpy().heap().theone, heap() calls hv_heap() to create a mutnodeset, and then IdentitySetFamily._cons() sees that it's a set of size 1 and calls IdentitySetSingleton with the single element it contains, and the constructor sets the _node attribute.

Does this answer how it works?

@michael-hahn
Copy link
Author

Hi @zhuyifei1999,

Thank you for your thorough explanation. This is definitely very helpful! I was able to follow the exact control flow when debugging with PyCharm.

Because nodeset contains objects, it has to keep the objects alive. This is where Py_INCREF you mentioned comes in.

This makes sense! To confirm: when adding an object by calling NyNodeSet_setobj():

guppy3/src/sets/nodeset.c

Lines 599 to 619 in a1bd557

int
NyNodeSet_setobj(NyNodeSetObject *v, PyObject *obj)
{
if (NyMutNodeSet_Check(v)) {
NyBit bitno = nodeset_obj_to_bitno(obj);
int r = NyMutBitSet_setbit((NyMutBitSetObject *)v->u.bitset, bitno);
if (r == -1)
return -1;
if (!r) {
Py_SIZE(v)++;
if (v->flags & NS_HOLDOBJECTS) {
Py_INCREF(obj);
}
}
return r;
} else {
PyErr_Format(PyExc_ValueError,
"mutable nodeset required");
return -1;
}
}

The core functionality that adds the object is actually this:
int r = NyMutBitSet_setbit((NyMutBitSetObject *)v->u.bitset, bitno);

where it adds the object's adjusted address to the bitset (or nodes, depending on whether the set is immutable or not) within the NyNodeSetObject struct:

guppy3/src/sets/nodeset.c

Lines 325 to 329 in a1bd557

static NyBit
nodeset_obj_to_bitno(PyObject *obj)
{
return (NyBit) obj / ALIGN;
}

Am I correct?

When retrieving the object from theone, I am assuming this method:

def _get_theone(self):
return self._node

somehow calls some C code to get the object from the address? I am sorry, this is where I get a bit fuzzy. I presume the following C function is triggered by some function(s) when self._node in the code above is called?

guppy3/src/sets/nodeset.c

Lines 331 to 335 in a1bd557

static PyObject *
nodeset_bitno_to_obj(NyBit bitno)
{
return (PyObject *)(bitno * ALIGN);
}

Thank you!

@zhuyifei1999
Copy link
Owner

where it adds the object's adjusted address to the bitset (or nodes, depending on whether the set is immutable or not) within the NyNodeSetObject struct:

Immutable one cannot be added. NyMutNodeSet_Check(v) fails and you get ValueError: mutable nodeset required.

When retrieving the object from theone, I am assuming this method:
somehow calls some C code to get the object from the address? I am sorry, this is where I get a bit fuzzy. I presume the following C function is triggered by some function(s) when self._node in the code above is called?

No, custom C code is completely uninvolved here.

This corresponds to

class IdentitySetSingleton(IdentitySet):
__slots__ = '_node',
_help_url_ = 'heapy_UniSet.html#heapykinds.IdentitySetSingleton'
def __init__(self, fam, node):
super().__init__(fam)
self._node = node

In __init__:

self._node = node

In _get_theone:

return self._node 

How __init__ was called with the element / node was explained above. The argument to __init__ is the object returned by theone. Nothing fancy involved.

@michael-hahn
Copy link
Author

michael-hahn commented Jun 24, 2021

OK. I think I see what is going on here. I believe, correct me if I am wrong, that the "trick" here is:

guppy3/guppy/heapy/UniSet.py

Lines 2214 to 2215 in a1bd557

def idset(self, iterable, er=None):
return self.fam_IdentitySet._cons(self.immnodeset(iterable), er=er)

It is when self.immnodeset() is called where we store actual objects into the immutable node set.
I see that when a copy is made:
NyNodeSetObject *
NyImmNodeSet_SubtypeNewCopy(PyTypeObject *type, NyNodeSetObject *v)
{
NSISetArg sa;
sa.i = 0;
sa.ns = NyImmNodeSet_SubtypeNew(type, Py_SIZE(v), v->_hiding_tag_);
if (!sa.ns)
return 0;
NyNodeSet_iterate(v, (visitproc)as_immutable_visit, &sa);
return sa.ns;
}
NyNodeSetObject *
NyImmNodeSet_NewCopy(NyNodeSetObject *v)
{
return NyImmNodeSet_SubtypeNewCopy(&NyImmNodeSet_Type, v);
}

NyNodeSet_iterate() would call mutnodeset_iterate_visit() to dereference the address from bitno, which is stored in mutable node set's bitset, and get the object. The object is then stored in nodes:
typedef struct {
PyObject_VAR_HEAD
int flags;
PyObject *_hiding_tag_;
union {
PyObject *bitset; /* If mutable type, a mutable bitset with addresses (divided). */
PyObject *nodes[1]; /* If immutable type, the start of node array, in address order. */
} u;
} NyNodeSetObject;

I guess my last question is, what is stored in nodes[0] since the NyNodeSetObject holds the pointer to nodes[1] instead of simply nodes?

Thanks!

@zhuyifei1999
Copy link
Owner

I guess my last question is, what is stored in nodes[0] since the NyNodeSetObject holds the pointer to nodes[1] instead of simply nodes?

This you can answer with GDB:

$ find build guppy -name '*.so' -delete; pip install --global-option build --global-option --debug -e .
Obtaining file:///home/zhuyifei1999/guppy3
Installing collected packages: guppy3
  Attempting uninstall: guppy3
    Found existing installation: guppy3 3.1.0
    Uninstalling guppy3-3.1.0:
      Successfully uninstalled guppy3-3.1.0
  Running setup.py develop for guppy3
Successfully installed guppy3-3.1.0
$ gdb python
GNU gdb (Gentoo 10.2 vanilla) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
[...]
(gdb) r
Starting program: /home/zhuyifei1999/guppy3/venv/bin/python 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Python 3.9.5 (default, Jun  2 2021, 13:52:24) 
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import guppy.sets.setsc
>>> def a():
...  b = object()
...  print(b)
...  c = guppy.sets.setsc.ImmNodeSet([b])
...  while True: pass
... 
>>> a()
<object object at 0x7ffff6bec3a0>
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7c2d1cc in _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>, throwflag=<optimized out>) at /usr/src/debug/dev-lang/python-3.9.5_p2/Python-3.9.5/Python/ceval.c:3256
3256	            DISPATCH();
(gdb) py-locals
b = <object at remote 0x7ffff6bec3a0>
c = <guppy.sets.setsc.ImmNodeSet at remote 0x7ffff6ad8ae0>
(gdb) p *(NyNodeSetObject *)0x7ffff6ad8ae0
$1 = {
  ob_base = {
    ob_base = {
      ob_refcnt = 0x1,
      ob_type = 0x7ffff6abd5c0 <NyImmNodeSet_Type>
    },
    ob_size = 0x1
  },
  flags = 0x1,
  _hiding_tag_ = 0x0,
  u = {
    bitset = <object at remote 0x7ffff6bec3a0>,
    nodes = {<object at remote 0x7ffff6bec3a0>}
  }
}
(gdb) 

The nodes[1] doesn't really mean it's an array of size 1, it's a C array of indefinite size. The array contains whatever objects the nodeset have. This is similar to how CPython implements tuples:
https://github.com/python/cpython/blob/769d7d0c66c5b86e2dd29b9ce67ac2daaab1bb38/Include/cpython/tupleobject.h#L5-L11

typedef struct {
    PyObject_VAR_HEAD
    /* ob_item contains space for 'ob_size' elements.
       Items must normally not be NULL, except during construction when
       the tuple is not yet visible outside the function that builds it. */
    PyObject *ob_item[1];
} PyTupleObject;

@michael-hahn
Copy link
Author

I am not sure if you actually answered my question. I am aware that nodes[1] is just a pointer that points to the second element of nodes, not an array of size 1. But why does it skip the first element? Why does bitset not do the same (i.e., bitset[1])?

@zhuyifei1999
Copy link
Owner

zhuyifei1999 commented Jun 24, 2021

I am aware that nodes[1] is just a pointer that points to the second element of nodes, not an array of size 1

Don't understand your question. I'm referring to the declaration PyObject *nodes[1]; being an array of indefinite size. The literal declaration of this is "nodes is an array of 1 element, of a pointer to PyObject". What I'm saying is, rather than that, it is "nodes is an array of indefinite size, of pointers to PyObject".

In other words, the assumption "NyNodeSetObject holds the pointer to nodes[1]" is incorrect.

This might help you: https://cdecl.org/?q=int+*nodes%5B1%5D%3B

Edit: I resized I was skipping too fast in my thoughts in my answer above. Sorry,

@michael-hahn
Copy link
Author

michael-hahn commented Jun 24, 2021

My bad! It all makes sense! I need to brush up on my C syntax! Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants