Specializations #74

markshannon · 2021-07-27T08:59:31Z

This is a meta issue for tracking which bytecodes have been specialized, need to be specialized, need to be improved, or aren't worth specializing.

Completed

LOAD_GLOBAL Port LOAD_GLOBAL from old opcache to PEP 659 adaptive interpreter. #51
BINARY_MULTIPLY

Done, but could be improved

LOAD_ATTR Port LOAD_ATTR from old opcache to PEP 659 adaptive interpreter #52
BINARY_OP
BINARY_SUBSCR Implement BINARY_SUBSCR specialization "family" for PEP 659 adaptive interpreter #59
STORE_ATTR Implement STORE_ATTR specialization "family" for PEP 659 adaptive interpreter #53
LOAD_METHOD Specialize LOAD_METHOD #81
STORE_SUBSCR

In progress

CALL_NO_KW

To do

CALL_KW

Needs more data

Before more optimizing these, we need to know how well they will specialize

FOR_ITER Consider FOR_ITER family for specialization? #67
FORMAT_VALUE bpo-33092: The bytecode for f-string formatting is inefficient. #61

Not going to do

All bytecodes whose behavior does not depend on the value(s) operated on, e.g. LOAD_FAST

markshannon · 2021-08-13T12:00:43Z

Latest stats

2 Feb 2020 main

kinds

Summary:

Opcode	Percentage specialized	Notes
BINARY_OP	80%	Not bad, but we are lacking stats on the remaining 20%.
BINARY_SUBSCR	54%	Not so good. Looks like we need to optimize `array.array`, or maybe buffers in general?
CALL	72%	Stats suggest that we can achieve hit rates over 90%, so we have a way to go.
COMPARE_OP	81%	Not bad, but some room for improvement.
LOAD_ATTR	81%	We want >90% for this. There are three or four kinds to consider specializing.
LOAD_GLOBAL	98%	Excellent -- Nothing to do here.
LOAD_METHOD	79%	Reasonably good, could be improved a bit
STORE_ATTR	91%	Good enough, maybe worth adding one more kind.
STORE_SUBSCR	44%	Poor. Lacking stats on what to improve.
FOR_ITER	0%	To do

Stats

BINARY_SUBSCR:
 unquickened:     3227113 0.1%
    deferred:  1101464516 44.6%
       deopt:      280916 0.0%
         hit:  1348614398 54.6%
        miss:    14983743 0.6%
  success:      590052
  failure:    17014024
    kind  0:   471248 2.8%
    kind  8: 10149228 59.7%
    kind 10:  3531369 20.8%
    kind 11:    72077 0.4%
    kind 12:   376360 2.2%
    kind 13:   233554 1.4%
    kind 15:  1635570 9.6%
    kind 16:   537576 3.2%
    kind 17:     7042 0.0%
STORE_SUBSCR:
 unquickened:      931729 0.1%
    deferred:   523952274 56.1%
         hit:   408662270 43.8%
  success:      194604
  failure:     8200610
    kind  0:    11129 0.1%
    kind  4:   115578 1.4%
    kind  8:  4081958 49.8%
    kind  9:      147 0.0%
    kind 10:  2486944 30.3%
    kind 18:   996252 12.1%
    kind 20:   185406 2.3%
    kind 21:       22 0.0%
    kind 22:   323174 3.9%
FOR_ITER:
 unquickened:  1522432529 100.0%
  success:           0
  failure:  1522432529
    kind  0: 110120739 7.2%
    kind 10: 32320306 2.1%
    kind 13: 558033748 36.7%
    kind 14: 59715205 3.9%
    kind 15: 66676297 4.4%
    kind 16:   962641 0.1%
    kind 17:   647281 0.0%
    kind 18: 533954026 35.1%
    kind 19:  7142088 0.5%
    kind 20:  1028992 0.1%
    kind 21: 61157426 4.0%
    kind 22:   358042 0.0%
    kind 23: 90315738 5.9%
STORE_ATTR:
 unquickened:     4961734 0.7%
    deferred:    49250546 6.9%
       deopt:      170820 0.0%
         hit:   653372329 91.1%
        miss:     9573060 1.3%
  success:      632942
  failure:      726401
    kind  2:   151712 20.9%
    kind  4:   320223 44.1%
    kind  8:   159482 22.0%
    kind 11:    27240 3.7%
    kind 12:    18691 2.6%
    kind 13:     1130 0.2%
    kind 14:     9878 1.4%
    kind 17:    38045 5.2%
LOAD_ATTR:
 unquickened:    20416856 0.5%
    deferred:   617543119 15.4%
       deopt:     1483196 0.0%
         hit:  3285043166 82.0%
        miss:    80720882 2.0%
  success:     2716557
  failure:     8589966
    kind  2:  2765700 32.2%
    kind  4:  1598593 18.6%
    kind  7:     7628 0.1%
    kind  8:  1681725 19.6%
    kind 11:   561790 6.5%
    kind 12:   125188 1.5%
    kind 13:   656794 7.6%
    kind 14:   200489 2.3%
    kind 17:   992059 11.5%
COMPARE_OP:
 unquickened:   177728698 10.9%
    deferred:   135444149 8.3%
       deopt:       15851 0.0%
         hit:  1313969931 80.7%
        miss:      921840 0.1%
  success:      265700
  failure:     2272521
    kind  0:  1059928 46.6%
    kind 12:   995251 43.8%
    kind 13:       85 0.0%
    kind 14:   121380 5.3%
    kind 15:    95877 4.2%
BINARY_OP:
 unquickened:   707584010 14.0%
    deferred:   287522268 5.7%
       deopt:      594198 0.0%
         hit:  4021197011 79.7%
        miss:    31658983 0.6%
  success:      845556
  failure:     4063088
    kind  0:  2197466 54.1%
    kind 12:  1865622 45.9%
LOAD_METHOD:
 unquickened:    35154997 2.1%
    deferred:   287373540 16.9%
       deopt:      573235 0.0%
         hit:  1348341579 79.2%
        miss:    30678965 1.8%
  success:     1447851
  failure:     4056402
    kind  0:     8452 0.2%
    kind  2:   255432 6.3%
    kind  5:   920411 22.7%
    kind  7:      420 0.0%
    kind  8:     1635 0.0%
    kind  9:    51413 1.3%
    kind 10:     9023 0.2%
    kind 12:    13843 0.3%
    kind 13:     7182 0.2%
    kind 14:       63 0.0%
    kind 15:  2548835 62.8%
    kind 17:    23455 0.6%
    kind 18:   216091 5.3%
    kind 19:      147 0.0%
CALL:
 unquickened:    51421152 2.0%
    deferred:   609976758 23.6%
       deopt:      898968 0.0%
         hit:  1873196386 72.5%
        miss:    48790544 1.9%
  success:     2924691
  failure:     9355713
    kind  0:   245988 2.6%
    kind  8:  1254781 13.4%
    kind 11:  2320782 24.8%
    kind 13:   615658 6.6%
    kind 14:   887759 9.5%
    kind 15:   204949 2.2%
    kind 16:   408226 4.4%
    kind 17:   182178 1.9%
    kind 19:  1348974 14.4%
    kind 22:    79833 0.9%
    kind 23:   760651 8.1%
    kind 24:   327591 3.5%
    kind 25:   680997 7.3%
    kind 26:    32576 0.3%
    kind 27:     4770 0.1%

iritkatriel · 2021-08-13T16:39:30Z

The two common fail cases of binary_subscr are:
SPEC_FAIL_LIST_NON_INT_SUBSCRIPT (8) and SPEC_FAIL_NOT_TUPLE_LIST_OR_DICT(10).

The first is presumably list with slice. The second could be many things.

iritkatriel · 2021-08-13T16:40:29Z

What benchmark are these stats from?

markshannon · 2021-08-15T16:08:14Z

From pyperformance.

markshannon · 2021-08-24T16:51:37Z

When adding (or removing) the odd bytecode in the past some core devs have seemed worried that we might run out of bytecodes.
While this is implausible for normal bytecodes, it might be an issue for specialized bytecodes.
So here is a summary estimate of how many bytecodes we will need for specialization at the very most.

Opcode	Specializations
LOAD_GLOBAL	3
BINARY_SUBSCR	7
LOAD_ATTR	10
STORE_ATTR	7
BINARY_ADD	6
INPLACE_ADD	7
BINARY_MUL	3
INPLACE_MUL	3
Other BINARY	6
LOAD_METHOD	6
FOR_ITER	6
CALL_FUNCTION	9
CALL_FUNCTION_KW	9
CALL_METHOD	8
CALL_METHOD_KW	8
Total	~100

Which is a lot, but that still leaves ~40 spare. So we aren't going to run out of opcodes.

I suspect it won't be that many, and there is a lot of potential code sharing between CALL_FUNCTION, CALL_FUNCTION_KW, CALL_METHOD, CALL_METHOD_KW.

iritkatriel · 2021-08-24T16:56:30Z

We can also have EXTENDED_OPCODE for rare opcodes.

gvanrossum · 2021-08-24T17:08:24Z

Yeah, and e.g. LOAD_ASSERTION_ERROR will be folded into LOAD_COMMON_CONSTANT.

gvanrossum · 2021-10-19T22:24:54Z

Here's a tidbit of interesting info from the Pyston Discord server:

our JIT is handwritten using dynasm. llvm is only used during compilation of the pyston executable itself to generate type specific versions of runtime functions (used by quickening) so we don't have to write them by hand like cpython3.11.
this allows us to easily have many hundreds of them because we don't have to to keep them manually up to date - they will just get generated during the build process.
here is a recent commit which adds a few more of this specializations (=traces). It adds traces for things like 'list()', 'int(float)', 'dict()',...
pyston/pyston@b6f49a5

That sounds like an interesting approach to attempt in the future.

markshannon · 2021-11-01T11:22:11Z

The following table lists all instructions, whether they can be specialized, whether it is worth specializing them, and the status of the work on specializing them.

Instruction	Specializable?	Worth doing?	Status
BEFORE_ASYNC_WITH	✔️	?
BEFORE_WITH	✔️	?
BINARY_OP	✔️	👍	~80% hits. Some operators still need specializing.
BUILD_CONST_KEY_MAP	❌		⚫
BUILD_LIST	❌		⚫
BUILD_MAP	❌		⚫
BUILD_SET	❌		⚫
BUILD_SLICE	❌		⚫
BUILD_STRING	❌		⚫
BUILD_TUPLE	❌		⚫
CALL	✔️	👍	Partly done.
CALL_FUNCTION_EX	✔️	👍	To do
COMPARE_OP	✔️	👍	Combine with following jump?
CONTAINS_OP	✔️	?
COPY	❌		⚫
DELETE_ATTR	✔️	?
DELETE_DEREF	❌		⚫
DELETE_FAST	❌		⚫
DELETE_GLOBAL	✔️	👎	Hard to conceive of circumstance where this would be hot
DELETE_NAME	✔️	👎	〃
DELETE_SUBSCR	✔️	?
DICT_MERGE	❌		⚫
DICT_UPDATE	❌		⚫
END_ASYNC_FOR	✔️	?
FORMAT_VALUE	✔️	?
FOR_ITER	✔️	?
GET_AITER	✔️	?
GET_ANEXT	✔️	?
GET_AWAITABLE	✔️	?
GET_ITER	✔️	?
GET_LEN	✔️	?
GET_YIELD_FROM_ITER	✔️	?
IMPORT_FROM	✔️	?
IMPORT_NAME	✔️	?
IMPORT_STAR	✔️	?
IS_OP	❌		⚫
JUMP_ABSOLUTE	❌		⚫
JUMP_FORWARD	❌		⚫
JUMP_IF_FALSE_OR_POP	❌		⚫
JUMP_IF_NOT_EXC_MATCH	✔️	?
JUMP_IF_TRUE_OR_POP	❌		⚫
LIST_APPEND	❌		⚫
LIST_EXTEND	✔️	?
LIST_TO_TUPLE	❌		⚫
LOAD_ASSERTION_ERROR	❌		⚫
LOAD_ATTR	✔️	👍	~80% hits on benchmarks. Needs improvement
LOAD_BUILD_CLASS	✔️	?
LOAD_CLASSDEREF	❌		⚫
LOAD_CLOSURE	❌		⚫
LOAD_CONST	❌		⚫
LOAD_DEREF	❌		⚫
LOAD_FAST	❌		⚫
LOAD_GLOBAL	✔️	👍	Done
LOAD_METHOD	✔️	👍	~80% hits on benchmarks. Needs improvement
LOAD_NAME	✔️	?
MAKE_CELL	❌		⚫
MAKE_FUNCTION	✔️	?
MAP_ADD	✔️	?
MATCH_CLASS	✔️	?
MATCH_KEYS	✔️	?
MATCH_MAPPING	✔️	👎	Already branchless, so probably impossible to improve performance.
MATCH_SEQUENCE	✔️	👎	〃
NOP	❌		⚫
POP_EXCEPT	❌		⚫
POP_EXCEPT_AND_RERAISE	❌		⚫
POP_JUMP_IF_FALSE	❌		⚫
POP_JUMP_IF_TRUE	❌		⚫
POP_TOP	❌		⚫
PRECALL_METHOD	❌		⚫
PRINT_EXPR	✔️	No	Only used in the repl
PUSH_EXC_INFO	❌		⚫
RAISE_VARARGS	❌		⚫
RERAISE	❌		⚫
RESUME	❌		?
RETURN_VALUE	❌		⚫
SEND	✔️	?
SETUP_ANNOTATIONS	✔️	?
SET_ADD	✔️	?
SET_UPDATE	✔️	?
STORE_ATTR	✔️	👍
STORE_DEREF	❌		⚫
STORE_FAST	❌		⚫
STORE_GLOBAL	✔️	👍
STORE_NAME	✔️	?
STORE_SUBSCR	✔️	👍
SWAP	❌		⚫
UNARY_INVERT	✔️	?
UNARY_NEGATIVE	✔️	?
UNARY_NOT	✔️	?
UNARY_POSITIVE	✔️	?
UNPACK_EX	✔️	?
UNPACK_SEQUENCE	✔️	?
WITH_EXCEPT_START	✔️	?
YIELD_VALUE	❌		⚫

gvanrossum · 2021-11-01T14:10:10Z

What’s the difference between a green check mark and a grayed out one?

brandtbucher · 2021-11-01T15:36:29Z

Thanks for putting this together. Some thoughts:

GET_LEN/MATCH_CLASS/MATCH_KEYS: Yup, clear candidates for specialization. I have some pattern matching pyperf benchmarks that I was working on a couple of months ago... it might be worth dusting those off and seeing what sort of improvement we can get. Probably lower priority, though, since obviously any work here won't move pyperformance at all.
MATCH_MAPPING/MATCH_SEQUENCE: I doubt that we'd see any significant improvement with these, right? The current implementation is already super streamlined.
UNARY_INVERT/UNARY_NEGATIVE/UNARY_NOT/UNARY_POSITIVE: Do you think it's worth exploring combining these into a single UNARY_OP instruction?

brandtbucher · 2021-11-01T17:06:01Z

What’s the difference between a green check mark and a grayed out one?

It looks like ✅ means "can be specialized" and ✔️ means "already specialized" (which also implies ✅). Both are green on my machine.

markshannon · 2021-11-02T12:30:52Z

Specializable is now just ✔️ or ❌.
I've also corrected several bytecodes.

markshannon · 2021-11-02T12:41:20Z

* `MATCH_MAPPING`/`MATCH_SEQUENCE`: I doubt that we'd see any significant improvement with these, right? The current implementation is already _super_ streamlined.

I've updated the table.

* `UNARY_INVERT`/`UNARY_NEGATIVE`/`UNARY_NOT`/`UNARY_POSITIVE`: Do you think it's worth exploring combining these into a single `UNARY_OP` instruction?

It might well be worth combining UNARY_INVERT/UNARY_NEGATIVE/UNARY_POSITIVE, but not UNARY_NOT as that is not a slot operator.

-int, -float and ~int might well be worth specializing for.

sweeneyde · 2021-12-18T22:29:54Z

There are more detailed stats about STORE_SUBSCR here, which I took before opening the specialization PR.

In particular, numbers are somewhat thrown off by the use of bytearray[int]=... in regex compilation, and especially by Scimark's use of array.array[int]=..., which I'm convinced says more about the particulars of pyperformance than about Python code in general.

I opened a PR to add some of those stats more permanently into the specialization code.

markshannon · 2022-12-09T15:46:19Z

We are now gathering stats automatically, most of the easy specialization are done, and the remaining have their own issues.

markshannon mentioned this issue Oct 5, 2021

bpo-45367: Specialize BINARY_MULTIPLY python/cpython#28727

Merged

gramster added the Epic label Nov 30, 2021

gramster added this to Fancy CPython Board Jan 10, 2022

gramster moved this to Todo in Fancy CPython Board Jan 10, 2022

gramster moved this from Todo to Other in Fancy CPython Board Jan 10, 2022

markshannon removed this from Fancy CPython Board Jan 24, 2022

mdboom added the epic-specialization More specialization work for 3.12 label Aug 2, 2022

markshannon closed this as completed Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specializations #74

Specializations #74

markshannon commented Jul 27, 2021 •

edited

Loading

markshannon commented Aug 13, 2021 •

edited

Loading

iritkatriel commented Aug 13, 2021

iritkatriel commented Aug 13, 2021

markshannon commented Aug 15, 2021

markshannon commented Aug 24, 2021

iritkatriel commented Aug 24, 2021

gvanrossum commented Aug 24, 2021

gvanrossum commented Oct 19, 2021

markshannon commented Nov 1, 2021 •

edited

Loading

gvanrossum commented Nov 1, 2021 •

edited

Loading

brandtbucher commented Nov 1, 2021

brandtbucher commented Nov 1, 2021 •

edited

Loading

markshannon commented Nov 2, 2021

markshannon commented Nov 2, 2021

sweeneyde commented Dec 18, 2021 •

edited

Loading

markshannon commented Dec 9, 2022

Specializations #74

Specializations #74

Comments

markshannon commented Jul 27, 2021 • edited Loading

Completed

Done, but could be improved

In progress

To do

Needs more data

Not going to do

markshannon commented Aug 13, 2021 • edited Loading

Latest stats

Summary:

Stats

iritkatriel commented Aug 13, 2021

iritkatriel commented Aug 13, 2021

markshannon commented Aug 15, 2021

markshannon commented Aug 24, 2021

iritkatriel commented Aug 24, 2021

gvanrossum commented Aug 24, 2021

gvanrossum commented Oct 19, 2021

markshannon commented Nov 1, 2021 • edited Loading

gvanrossum commented Nov 1, 2021 • edited Loading

brandtbucher commented Nov 1, 2021

brandtbucher commented Nov 1, 2021 • edited Loading

markshannon commented Nov 2, 2021

markshannon commented Nov 2, 2021

sweeneyde commented Dec 18, 2021 • edited Loading

markshannon commented Dec 9, 2022

markshannon commented Jul 27, 2021 •

edited

Loading

markshannon commented Aug 13, 2021 •

edited

Loading

markshannon commented Nov 1, 2021 •

edited

Loading

gvanrossum commented Nov 1, 2021 •

edited

Loading

brandtbucher commented Nov 1, 2021 •

edited

Loading

sweeneyde commented Dec 18, 2021 •

edited

Loading