-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up frame handling in Python-to-Python calls. #111
Comments
Combining them into a single object, i think these fields still needs INCREF and DECREF. these work saved ? |
I don't think that's true. For example, changing the refcount of a list doesn't modify the refcounts of each of its contained items (that only happens when the list is created or destroyed). Same idea here. |
The |
An alternative approach which doesn't require a new object, is as follows: New frame layout
Placing the code object first allows us to find the stack base without needing an extra register. Changes to the bytecodeThe advantage of this layout becomes clear if we do two extra things:
Call sequenceThis gives us efficient calls. For calls, the top of the stack looks like:
By setting the Return and yield sequenceTo return or yield we need to access the linkage section, for which we need to access the stack base. We can avoid any copying and reduce stack consumption when calling by inserting |
The additional complexity of #111 (comment) seems to be causing a slowdown, not a speedup. |
We seem to have run out of obvious improvements to frame layout. The only enhancement I can think of is to move the "slow" locals into a "fast" local, and access it via A possibly cheaper alternative would be track whether Overall, I think we should just call this "done", and work on other stuff. |
For Python-to-Python calls we avoid consuming the C stack by making the call with the
_PyEval_EvalFrameDefault
function.However, the handling of frames is not as efficient as it could be.
Tighten this up would have a few benefits:
__init__
,__setitem__
, etc.)In order to speed up frame handling we need to reduce the amount of work done in pushing the frame, and when clearing the frame.
The frame consists of three parts:
The stack is empty on both entry and exit, so has no cost apart from setting the
stacktop
on entry. This is about as efficient as it can be.The use of local variables could be tracked in the compiler to create a bitmap describing which locals needs to cleared on exit. However, without a lot of additional work in the compiler, the bitmap will not be precise so we would gain little from it.
That leaves the specials. Most of the cost is in initializing and clearing the four fields:
Not only do these need to be copied from the function on entry, they each need an
INCREF
on entry and (more expensively) aDECREF
on exit. Combining them into a single object would save this work on call and return.Would become
and initializing the "specials" part of the frame would become considerably cheaper, and use less space.
There are some downsides to creating this object, however:
PyFunctionObject
to the internal headers to make that explicit.LOAD_GLOBAL
due to the extra indirection. Hopefully the cost of the extra memory load inLOAD_GLOBAL
will be outweighed by saving many indirections and branches in each call.f_locals
is always NULL for functions, it is non-NULL and cannot be shared when executing module or class level code. Each call to module or class level code would need a newPyFrameScopes
to be created.The text was updated successfully, but these errors were encountered: