bpo-46564: Optimize `super().meth()` calls via adaptive superinstructions #30992

Fidget-Spinner · 2022-01-28T16:47:41Z

They should now have almost no overhead over a corresponding self.meth() call.

Summary of changes:

typeobject.c -- refactoring to reuse code during specialization, also use InterpreterFrame over PyFrameObject for lazy frame benefits. Some changes here are partially taken from bpo-43563 : Introduce dedicated opcodes for super calls #24936. All credits to @vladima (I've tried to properly include them in the news item too.)
specialize.c -- specialize for the 0-argument and 2-argument form of super().
ceval.c -- does both a CALL and LOAD_METHOD without intermediates (and both are specialized forms too).

TODO:
benchmarks!

https://bugs.python.org/issue46564

Co-Authored-By: Vladimir Matveev <[email protected]>

Fidget-Spinner · 2022-01-28T18:14:23Z

Marking as draft as I need make this work with the new CALL convention.

Python/ceval.c

markshannon · 2022-01-29T11:40:20Z

Python/ceval.c

+
+            DEOPT_IF(_PyType_CAST(super_callable) != &PySuper_Type, CALL);
+            /* super() - zero argument form */
+            if (_PySuper_GetTypeArgs(frame, frame->f_code, &su_type, &su_obj) < 0) {


Can't we do this at specialization time? The number of locals, the index of "self", and whether it is a cell are all known then. Likewise the nature of __class__ is also known.

markshannon · 2022-01-29T11:40:26Z

Python/ceval.c

+            }
+            assert(su_obj != NULL);
+            DEOPT_IF(lm_adaptive->version != Py_TYPE(su_obj)->tp_version_tag, CALL);
+            DEOPT_IF(cache0->version != su_type->tp_version_tag, CALL);


When can this fail?
Isn't the next item in the MRO determined solely by type(self) and __class__, both of which are known at this point?

I wanted assurance that __class__ didn't change.. Then again, I'm not sure if it can?

markshannon · 2022-01-29T11:45:15Z

Maybe we should merge #31002 first, as that PR is simpler.
It would also allow us to compare the performance of just the specialization, without the removal of frame object allocation.

Fidget-Spinner · 2022-01-29T11:49:49Z

Maybe we should merge #31002 first, as that PR is simpler. It would also allow us to compare the performance of just the specialization, without the removal of frame object allocation.

👍

…ion guards

Fidget-Spinner · 2022-02-01T13:49:43Z

Mark, I'm going to run benchmarks on deltablue first since it uses 2-argument form super. I'll address your optimization ideas for the 0-arg form once those results come back. Fingers crossed.

arhadthedev

A couple of indentation-related inconsistencies:

arhadthedev · 2022-02-01T14:27:21Z

Objects/typeobject.c

-static int
-super_init_without_args(InterpreterFrame *cframe, PyCodeObject *co,
+int
+_PySuper_GetTypeArgs(InterpreterFrame *cframe, PyCodeObject *co,
                        PyTypeObject **type_p, PyObject **obj_p)


Suggested change

PyTypeObject **type_p, PyObject **obj_p)

PyTypeObject **type_p, PyObject **obj_p)

The line was aligned with an opening parenthesis of a parameter list.

arhadthedev · 2022-02-01T14:30:32Z

Include/internal/pycore_code.h

+     PyObject *kwnames, SpecializedCacheEntry *cache, PyObject *builtins,
+    PyObject **stack_pointer, InterpreterFrame *frame, PyObject *names);


Suggested change

PyObject *kwnames, SpecializedCacheEntry *cache, PyObject *builtins,

PyObject **stack_pointer, InterpreterFrame *frame, PyObject *names);

PyObject *kwnames, SpecializedCacheEntry *cache, PyObject *builtins,

PyObject **stack_pointer, InterpreterFrame *frame, PyObject *names);

as in a removed line, or even:

Suggested change

PyObject *kwnames, SpecializedCacheEntry *cache, PyObject *builtins,

PyObject **stack_pointer, InterpreterFrame *frame, PyObject *names);

PyObject *kwnames, SpecializedCacheEntry *cache, PyObject *builtins,

PyObject **stack_pointer, InterpreterFrame *frame, PyObject *names);

as in _Py_Specialize_BinaryOp right below.

Fidget-Spinner · 2022-02-01T16:09:16Z

Mark, I'm going to run benchmarks on deltablue first since it uses 2-argument form super. I'll address your optimization ideas for the 0-arg form once those results come back. Fingers crossed.

Well that was depressing. deltablue only shows 1.03x speedup. Looking closer at the code, super isn't called in any tight loops so that might be why. Maybe I need to pull out microbenchmarks now.

Fidget-Spinner · 2022-02-01T16:32:16Z

Microbenchmarks show that super() has sped up by more than 2.2x. This is faster than that other attempt because there's also speedups from the LOAD_METHOD_CACHED:
(Extremely unscientific, I'm short on time to set up pyperf right now)

import timeit

setup = """
class A:
    def f(self): pass
class B(A):
    def g(self): super().f()
    def h(self): self.f()

b = B()
"""

# super() call
print(timeit.timeit("b.g()", setup=setup, number=20_000_000))
# reference
print(timeit.timeit("b.h()", setup=setup, number=20_000_000))

Results:

# Main
5.796037399995839
2.4094066999969073

# This branch
2.4578273000006448
2.3718886000060593

So super().meth() is now only ~10% slowly than the corresponding self.meth() call whereas it was nearly 2x as slow previously. If I manage to incorporate your suggestions correctly, this will effectively just be a competition between LOAD_GLOBAL_BUILTIN (super) and LOAD_FAST (self).

Fidget-Spinner added 2 commits January 29, 2022 00:45

Near zero-cost super().meth() calls

3b12d1b

remove py_stats

5b7a98d

Fidget-Spinner requested a review from markshannon as a code owner January 28, 2022 16:47

the-knights-who-say-ni added the CLA signed label Jan 28, 2022

bedevere-bot added the awaiting core review label Jan 28, 2022

Fidget-Spinner mentioned this pull request Jan 28, 2022

Optimizing super().meth() via adaptive superinstructions faster-cpython/ideas#242

Closed

Fidget-Spinner changed the title ~~bpo-46564: Optimize super().meth() calls~~ bpo-46564: Optimize super().meth() calls via adaptive superinstructions Jan 28, 2022

Fidget-Spinner and others added 5 commits January 29, 2022 01:06

Add vladima to co-authors

efb0ae3

Co-Authored-By: Vladimir Matveev <[email protected]>

Merge remote-tracking branch 'upstream/main' into zero_cost_super

e037a56

fix merge errors

9d64819

Update test_frozenmain.h

ac19aa1

fix formatting, correct specialization

1f9bb6c

markshannon self-assigned this Jan 28, 2022

Fidget-Spinner marked this pull request as draft January 28, 2022 18:14

work with new call convention

f4cd3f9

Fidget-Spinner marked this pull request as ready for review January 28, 2022 18:23

Fidget-Spinner mentioned this pull request Jan 29, 2022

bpo-46564: do not create frame object for super object #31002

Merged

markshannon reviewed Jan 29, 2022

View reviewed changes

Python/ceval.c Show resolved Hide resolved

markshannon reviewed Jan 29, 2022

View reviewed changes

Fidget-Spinner added 3 commits January 29, 2022 21:27

partially address review: fix tests, add asserts, add cache modificat…

19880a9

…ion guards

fix refleak

696a0e8

Merge remote-tracking branch 'upstream/main' into zero_cost_super

01202ee

arhadthedev requested changes Feb 1, 2022

View reviewed changes

Fidget-Spinner mentioned this pull request Mar 5, 2022

bpo-46921: Vectorcall support for super() #31687

Merged

ezio-melotti removed the CLA signed label Jul 13, 2022

eendebakpt mentioned this pull request Aug 31, 2022

Triage the open "performance" issues on cpython repo faster-cpython/ideas#440

Closed

iritkatriel added the performance label Aug 31, 2022

carljm mentioned this pull request Apr 13, 2023

gh-87729: add LOAD_SUPER_ATTR instruction for faster super() #103497

Merged

Fidget-Spinner closed this Apr 27, 2023

Fidget-Spinner mentioned this pull request Apr 27, 2023

Near zero-cost super().meth() calls via adaptive superinstructions #90722

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-46564: Optimize `super().meth()` calls via adaptive superinstructions #30992

bpo-46564: Optimize `super().meth()` calls via adaptive superinstructions #30992

Fidget-Spinner commented Jan 28, 2022 •

edited

Loading

Fidget-Spinner commented Jan 28, 2022

markshannon Jan 29, 2022

markshannon Jan 29, 2022

Fidget-Spinner Jan 29, 2022 •

edited

Loading

markshannon commented Jan 29, 2022

Fidget-Spinner commented Jan 29, 2022

Fidget-Spinner commented Feb 1, 2022

arhadthedev left a comment

arhadthedev Feb 1, 2022

arhadthedev Feb 1, 2022

Fidget-Spinner commented Feb 1, 2022

Fidget-Spinner commented Feb 1, 2022 •

edited

Loading

	PyTypeObject type_p, PyObject obj_p)
	PyTypeObject type_p, PyObject obj_p)

		PyObject kwnames, SpecializedCacheEntry cache, PyObject *builtins,
		PyObject *stack_pointer, InterpreterFrame frame, PyObject *names);

bpo-46564: Optimize super().meth() calls via adaptive superinstructions #30992

bpo-46564: Optimize super().meth() calls via adaptive superinstructions #30992

Conversation

Fidget-Spinner commented Jan 28, 2022 • edited Loading

Fidget-Spinner commented Jan 28, 2022

markshannon Jan 29, 2022

Choose a reason for hiding this comment

markshannon Jan 29, 2022

Choose a reason for hiding this comment

Fidget-Spinner Jan 29, 2022 • edited Loading

Choose a reason for hiding this comment

markshannon commented Jan 29, 2022

Fidget-Spinner commented Jan 29, 2022

Fidget-Spinner commented Feb 1, 2022

arhadthedev left a comment

Choose a reason for hiding this comment

arhadthedev Feb 1, 2022

Choose a reason for hiding this comment

arhadthedev Feb 1, 2022

Choose a reason for hiding this comment

Fidget-Spinner commented Feb 1, 2022

Fidget-Spinner commented Feb 1, 2022 • edited Loading

bpo-46564: Optimize `super().meth()` calls via adaptive superinstructions #30992

bpo-46564: Optimize `super().meth()` calls via adaptive superinstructions #30992

Fidget-Spinner commented Jan 28, 2022 •

edited

Loading

Fidget-Spinner Jan 29, 2022 •

edited

Loading

Fidget-Spinner commented Feb 1, 2022 •

edited

Loading