Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specializations #74

Closed
markshannon opened this issue Jul 27, 2021 · 16 comments
Closed

Specializations #74

markshannon opened this issue Jul 27, 2021 · 16 comments
Labels
Epic epic-specialization More specialization work for 3.12

Comments

@markshannon
Copy link
Member

markshannon commented Jul 27, 2021

This is a meta issue for tracking which bytecodes have been specialized, need to be specialized, need to be improved, or aren't worth specializing.

Completed

Done, but could be improved

In progress

  • CALL_NO_KW

To do

  • CALL_KW

Needs more data

Before more optimizing these, we need to know how well they will specialize

Not going to do

  • All bytecodes whose behavior does not depend on the value(s) operated on, e.g. LOAD_FAST
@markshannon
Copy link
Member Author

markshannon commented Aug 13, 2021

Latest stats

2 Feb 2020 main

kinds

Summary:

Opcode Percentage specialized Notes
BINARY_OP 80% Not bad, but we are lacking stats on the remaining 20%.
BINARY_SUBSCR 54% Not so good. Looks like we need to optimize array.array, or maybe buffers in general?
CALL 72% Stats suggest that we can achieve hit rates over 90%, so we have a way to go.
COMPARE_OP 81% Not bad, but some room for improvement.
LOAD_ATTR 81% We want >90% for this. There are three or four kinds to consider specializing.
LOAD_GLOBAL 98% Excellent -- Nothing to do here.
LOAD_METHOD 79% Reasonably good, could be improved a bit
STORE_ATTR 91% Good enough, maybe worth adding one more kind.
STORE_SUBSCR 44% Poor. Lacking stats on what to improve.
FOR_ITER 0% To do

Stats

BINARY_SUBSCR:
 unquickened:     3227113 0.1%
    deferred:  1101464516 44.6%
       deopt:      280916 0.0%
         hit:  1348614398 54.6%
        miss:    14983743 0.6%
  success:      590052
  failure:    17014024
    kind  0:   471248 2.8%
    kind  8: 10149228 59.7%
    kind 10:  3531369 20.8%
    kind 11:    72077 0.4%
    kind 12:   376360 2.2%
    kind 13:   233554 1.4%
    kind 15:  1635570 9.6%
    kind 16:   537576 3.2%
    kind 17:     7042 0.0%
STORE_SUBSCR:
 unquickened:      931729 0.1%
    deferred:   523952274 56.1%
         hit:   408662270 43.8%
  success:      194604
  failure:     8200610
    kind  0:    11129 0.1%
    kind  4:   115578 1.4%
    kind  8:  4081958 49.8%
    kind  9:      147 0.0%
    kind 10:  2486944 30.3%
    kind 18:   996252 12.1%
    kind 20:   185406 2.3%
    kind 21:       22 0.0%
    kind 22:   323174 3.9%
FOR_ITER:
 unquickened:  1522432529 100.0%
  success:           0
  failure:  1522432529
    kind  0: 110120739 7.2%
    kind 10: 32320306 2.1%
    kind 13: 558033748 36.7%
    kind 14: 59715205 3.9%
    kind 15: 66676297 4.4%
    kind 16:   962641 0.1%
    kind 17:   647281 0.0%
    kind 18: 533954026 35.1%
    kind 19:  7142088 0.5%
    kind 20:  1028992 0.1%
    kind 21: 61157426 4.0%
    kind 22:   358042 0.0%
    kind 23: 90315738 5.9%
STORE_ATTR:
 unquickened:     4961734 0.7%
    deferred:    49250546 6.9%
       deopt:      170820 0.0%
         hit:   653372329 91.1%
        miss:     9573060 1.3%
  success:      632942
  failure:      726401
    kind  2:   151712 20.9%
    kind  4:   320223 44.1%
    kind  8:   159482 22.0%
    kind 11:    27240 3.7%
    kind 12:    18691 2.6%
    kind 13:     1130 0.2%
    kind 14:     9878 1.4%
    kind 17:    38045 5.2%
LOAD_ATTR:
 unquickened:    20416856 0.5%
    deferred:   617543119 15.4%
       deopt:     1483196 0.0%
         hit:  3285043166 82.0%
        miss:    80720882 2.0%
  success:     2716557
  failure:     8589966
    kind  2:  2765700 32.2%
    kind  4:  1598593 18.6%
    kind  7:     7628 0.1%
    kind  8:  1681725 19.6%
    kind 11:   561790 6.5%
    kind 12:   125188 1.5%
    kind 13:   656794 7.6%
    kind 14:   200489 2.3%
    kind 17:   992059 11.5%
COMPARE_OP:
 unquickened:   177728698 10.9%
    deferred:   135444149 8.3%
       deopt:       15851 0.0%
         hit:  1313969931 80.7%
        miss:      921840 0.1%
  success:      265700
  failure:     2272521
    kind  0:  1059928 46.6%
    kind 12:   995251 43.8%
    kind 13:       85 0.0%
    kind 14:   121380 5.3%
    kind 15:    95877 4.2%
BINARY_OP:
 unquickened:   707584010 14.0%
    deferred:   287522268 5.7%
       deopt:      594198 0.0%
         hit:  4021197011 79.7%
        miss:    31658983 0.6%
  success:      845556
  failure:     4063088
    kind  0:  2197466 54.1%
    kind 12:  1865622 45.9%
LOAD_METHOD:
 unquickened:    35154997 2.1%
    deferred:   287373540 16.9%
       deopt:      573235 0.0%
         hit:  1348341579 79.2%
        miss:    30678965 1.8%
  success:     1447851
  failure:     4056402
    kind  0:     8452 0.2%
    kind  2:   255432 6.3%
    kind  5:   920411 22.7%
    kind  7:      420 0.0%
    kind  8:     1635 0.0%
    kind  9:    51413 1.3%
    kind 10:     9023 0.2%
    kind 12:    13843 0.3%
    kind 13:     7182 0.2%
    kind 14:       63 0.0%
    kind 15:  2548835 62.8%
    kind 17:    23455 0.6%
    kind 18:   216091 5.3%
    kind 19:      147 0.0%
CALL:
 unquickened:    51421152 2.0%
    deferred:   609976758 23.6%
       deopt:      898968 0.0%
         hit:  1873196386 72.5%
        miss:    48790544 1.9%
  success:     2924691
  failure:     9355713
    kind  0:   245988 2.6%
    kind  8:  1254781 13.4%
    kind 11:  2320782 24.8%
    kind 13:   615658 6.6%
    kind 14:   887759 9.5%
    kind 15:   204949 2.2%
    kind 16:   408226 4.4%
    kind 17:   182178 1.9%
    kind 19:  1348974 14.4%
    kind 22:    79833 0.9%
    kind 23:   760651 8.1%
    kind 24:   327591 3.5%
    kind 25:   680997 7.3%
    kind 26:    32576 0.3%
    kind 27:     4770 0.1%

@iritkatriel
Copy link
Collaborator

The two common fail cases of binary_subscr are:
SPEC_FAIL_LIST_NON_INT_SUBSCRIPT (8) and SPEC_FAIL_NOT_TUPLE_LIST_OR_DICT(10).

The first is presumably list with slice. The second could be many things.

@iritkatriel
Copy link
Collaborator

What benchmark are these stats from?

@markshannon
Copy link
Member Author

From pyperformance.

@markshannon
Copy link
Member Author

When adding (or removing) the odd bytecode in the past some core devs have seemed worried that we might run out of bytecodes.
While this is implausible for normal bytecodes, it might be an issue for specialized bytecodes.
So here is a summary estimate of how many bytecodes we will need for specialization at the very most.

Opcode Specializations
LOAD_GLOBAL 3
BINARY_SUBSCR 7
LOAD_ATTR 10
STORE_ATTR 7
BINARY_ADD 6
INPLACE_ADD 7
BINARY_MUL 3
INPLACE_MUL 3
Other BINARY 6
LOAD_METHOD 6
FOR_ITER 6
CALL_FUNCTION 9
CALL_FUNCTION_KW 9
CALL_METHOD 8
CALL_METHOD_KW 8
Total ~100

Which is a lot, but that still leaves ~40 spare. So we aren't going to run out of opcodes.

I suspect it won't be that many, and there is a lot of potential code sharing between CALL_FUNCTION, CALL_FUNCTION_KW, CALL_METHOD, CALL_METHOD_KW.

@iritkatriel
Copy link
Collaborator

We can also have EXTENDED_OPCODE for rare opcodes.

@gvanrossum
Copy link
Collaborator

Yeah, and e.g. LOAD_ASSERTION_ERROR will be folded into LOAD_COMMON_CONSTANT.

@gvanrossum
Copy link
Collaborator

Here's a tidbit of interesting info from the Pyston Discord server:

  • our JIT is handwritten using dynasm. llvm is only used during compilation of the pyston executable itself to generate type specific versions of runtime functions (used by quickening) so we don't have to write them by hand like cpython3.11.
  • this allows us to easily have many hundreds of them because we don't have to to keep them manually up to date - they will just get generated during the build process.
  • here is a recent commit which adds a few more of this specializations (=traces). It adds traces for things like 'list()', 'int(float)', 'dict()',...
    pyston/pyston@b6f49a5

That sounds like an interesting approach to attempt in the future.

@markshannon
Copy link
Member Author

markshannon commented Nov 1, 2021

The following table lists all instructions, whether they can be specialized, whether it is worth specializing them, and the status of the work on specializing them.

Instruction Specializable? Worth doing? Status
BEFORE_ASYNC_WITH ✔️ ?
BEFORE_WITH ✔️ ?
BINARY_OP ✔️ 👍 ~80% hits. Some operators still need specializing.
BUILD_CONST_KEY_MAP
BUILD_LIST
BUILD_MAP
BUILD_SET
BUILD_SLICE
BUILD_STRING
BUILD_TUPLE
CALL ✔️ 👍 Partly done.
CALL_FUNCTION_EX ✔️ 👍 To do
COMPARE_OP ✔️ 👍 Combine with following jump?
CONTAINS_OP ✔️ ?
COPY
DELETE_ATTR ✔️ ?
DELETE_DEREF
DELETE_FAST
DELETE_GLOBAL ✔️ 👎 Hard to conceive of circumstance where this would be hot
DELETE_NAME ✔️ 👎
DELETE_SUBSCR ✔️ ?
DICT_MERGE
DICT_UPDATE
END_ASYNC_FOR ✔️ ?
FORMAT_VALUE ✔️ ?
FOR_ITER ✔️ ?
GET_AITER ✔️ ?
GET_ANEXT ✔️ ?
GET_AWAITABLE ✔️ ?
GET_ITER ✔️ ?
GET_LEN ✔️ ?
GET_YIELD_FROM_ITER ✔️ ?
IMPORT_FROM ✔️ ?
IMPORT_NAME ✔️ ?
IMPORT_STAR ✔️ ?
IS_OP
JUMP_ABSOLUTE
JUMP_FORWARD
JUMP_IF_FALSE_OR_POP
JUMP_IF_NOT_EXC_MATCH ✔️ ?
JUMP_IF_TRUE_OR_POP
LIST_APPEND
LIST_EXTEND ✔️ ?
LIST_TO_TUPLE
LOAD_ASSERTION_ERROR
LOAD_ATTR ✔️ 👍 ~80% hits on benchmarks. Needs improvement
LOAD_BUILD_CLASS ✔️ ?
LOAD_CLASSDEREF
LOAD_CLOSURE
LOAD_CONST
LOAD_DEREF
LOAD_FAST
LOAD_GLOBAL ✔️ 👍 Done
LOAD_METHOD ✔️ 👍 ~80% hits on benchmarks. Needs improvement
LOAD_NAME ✔️ ?
MAKE_CELL
MAKE_FUNCTION ✔️ ?
MAP_ADD ✔️ ?
MATCH_CLASS ✔️ ?
MATCH_KEYS ✔️ ?
MATCH_MAPPING ✔️ 👎 Already branchless, so probably impossible to improve performance.
MATCH_SEQUENCE ✔️ 👎
NOP
POP_EXCEPT
POP_EXCEPT_AND_RERAISE
POP_JUMP_IF_FALSE
POP_JUMP_IF_TRUE
POP_TOP
PRECALL_METHOD
PRINT_EXPR ✔️ No Only used in the repl
PUSH_EXC_INFO
RAISE_VARARGS
RERAISE
RESUME ?
RETURN_VALUE
SEND ✔️ ?
SETUP_ANNOTATIONS ✔️ ?
SET_ADD ✔️ ?
SET_UPDATE ✔️ ?
STORE_ATTR ✔️ 👍
STORE_DEREF
STORE_FAST
STORE_GLOBAL ✔️ 👍
STORE_NAME ✔️ ?
STORE_SUBSCR ✔️ 👍
SWAP
UNARY_INVERT ✔️ ?
UNARY_NEGATIVE ✔️ ?
UNARY_NOT ✔️ ?
UNARY_POSITIVE ✔️ ?
UNPACK_EX ✔️ ?
UNPACK_SEQUENCE ✔️ ?
WITH_EXCEPT_START ✔️ ?
YIELD_VALUE

@gvanrossum
Copy link
Collaborator

gvanrossum commented Nov 1, 2021

What’s the difference between a green check mark and a grayed out one?

@brandtbucher
Copy link
Member

Thanks for putting this together. Some thoughts:

  • GET_LEN/MATCH_CLASS/MATCH_KEYS: Yup, clear candidates for specialization. I have some pattern matching pyperf benchmarks that I was working on a couple of months ago... it might be worth dusting those off and seeing what sort of improvement we can get. Probably lower priority, though, since obviously any work here won't move pyperformance at all.
  • MATCH_MAPPING/MATCH_SEQUENCE: I doubt that we'd see any significant improvement with these, right? The current implementation is already super streamlined.
  • UNARY_INVERT/UNARY_NEGATIVE/UNARY_NOT/UNARY_POSITIVE: Do you think it's worth exploring combining these into a single UNARY_OP instruction?

@brandtbucher
Copy link
Member

brandtbucher commented Nov 1, 2021

What’s the difference between a green check mark and a grayed out one?

It looks like ✅ means "can be specialized" and ✔️ means "already specialized" (which also implies ✅). Both are green on my machine.

@markshannon
Copy link
Member Author

Specializable is now just ✔️ or ❌.
I've also corrected several bytecodes.

@markshannon
Copy link
Member Author

* `MATCH_MAPPING`/`MATCH_SEQUENCE`: I doubt that we'd see any significant improvement with these, right? The current implementation is already _super_ streamlined.

I've updated the table.

* `UNARY_INVERT`/`UNARY_NEGATIVE`/`UNARY_NOT`/`UNARY_POSITIVE`: Do you think it's worth exploring combining these into a single `UNARY_OP` instruction?

It might well be worth combining UNARY_INVERT/UNARY_NEGATIVE/UNARY_POSITIVE, but not UNARY_NOT as that is not a slot operator.

-int, -float and ~int might well be worth specializing for.

@gramster gramster added the Epic label Nov 30, 2021
@sweeneyde
Copy link

sweeneyde commented Dec 18, 2021

There are more detailed stats about STORE_SUBSCR here, which I took before opening the specialization PR.

In particular, numbers are somewhat thrown off by the use of bytearray[int]=... in regex compilation, and especially by Scimark's use of array.array[int]=..., which I'm convinced says more about the particulars of pyperformance than about Python code in general.

I opened a PR to add some of those stats more permanently into the specialization code.

@gramster gramster moved this to Todo in Fancy CPython Board Jan 10, 2022
@gramster gramster moved this from Todo to Other in Fancy CPython Board Jan 10, 2022
@mdboom mdboom added the epic-specialization More specialization work for 3.12 label Aug 2, 2022
@markshannon
Copy link
Member Author

We are now gathering stats automatically, most of the easy specialization are done, and the remaining have their own issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic epic-specialization More specialization work for 3.12
Projects
None yet
Development

No branches or pull requests

7 participants