Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-97933: add opcode for more efficient comprehension execution #101310

Closed
wants to merge 3 commits into from

Conversation

carljm
Copy link
Member

@carljm carljm commented Jan 25, 2023

This avoids allocating a throwaway single-use function object every time we run a comprehension. Otherwise it shouldn't have any user-visible impact; the comprehension is still a separate code object and runs in its own frame, just as before. Tracebacks look the same, etc. We just have a new COMPREHENSION opcode that builds a frame directly from code
object and optional closure and inline-calls it, without creating a function object.

In a micro-benchmark of comprehension execution time, this looks like it saves about 25%:

On inlinecomp (e4a68550426dbea79bd9fc6ff0a75395891d35b7):

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]'
.....................
Mean +- std dev: 243 ns +- 14 ns

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]'
.....................
Mean +- std dev: 240 ns +- 11 ns

On main (f02fa64bf2d03ef7a28650c164e17a5fb5d8543d):

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]'
.....................
Mean +- std dev: 324 ns +- 11 ns

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]'
.....................
Mean +- std dev: 329 ns +- 18 ns

Currently this doesn't handle async comprehensions or generator expressions; those still create a function object and CALL it. In principle I think they could be handled as well, but to keep the PR smaller I'll defer that to a second PR if this one is merged.

Closes #97933.

@carljm
Copy link
Member Author

carljm commented Jan 25, 2023

@markshannon or @iritkatriel (or anyone else from faster-cpython team) -- if you could kick off a PyPerformance run with your fancy Github Action runner, I'd be much obliged!

@carljm carljm added the performance Performance or resource usage label Jan 25, 2023
@iritkatriel
Copy link
Member

@markshannon or @iritkatriel (or anyone else from faster-cpython team) -- if you could kick off a PyPerformance run with your fancy Github Action runner, I'd be much obliged!

Done.

@iritkatriel
Copy link
Member

Benchmark results are here: https://github.com/faster-cpython/benchmarking/blob/main/results/bm-20230125-3.12.0a4+-ee2ad56/bm-20230125-linux-x86_64-carljm-inlinecomp-3.12.0a4+-ee2ad56-vs-base.md

(shows no difference overall).

@carljm
Copy link
Member Author

carljm commented Jan 25, 2023

I can't see the detailed results (private repo), but if there's no impact overall then I guess they aren't that interesting :) We must not have any benchmarks that make heavy use of comprehensions.

@markshannon
Copy link
Member

All benchmarks:

Benchmark bm-20230124-linux-x86_64-python-f02fa64bf2d03ef7a286-3.12.0a4+-f02fa64 bm-20230125-linux-x86_64-carljm-inlinecomp-3.12.0a4+-ee2ad56
2to3 251 ms 249 ms: 1.01x faster
async_generators 349 ms 355 ms: 1.02x slower
async_tree_memoization 624 ms 656 ms: 1.05x slower
asyncio_tcp 488 ms 492 ms: 1.01x slower
chaos 66.2 ms 64.6 ms: 1.02x faster
bench_thread_pool 775 us 781 us: 1.01x slower
coroutines 24.8 ms 25.6 ms: 1.03x slower
deepcopy_reduce 2.91 us 2.96 us: 1.02x slower
deepcopy_memo 33.8 us 34.6 us: 1.02x slower
docutils 2.55 sec 2.51 sec: 1.02x faster
fannkuch 373 ms 369 ms: 1.01x faster
float 73.2 ms 72.0 ms: 1.02x faster
create_gc_cycles 1.47 ms 1.44 ms: 1.02x faster
gc_traversal 4.30 ms 3.64 ms: 1.18x faster
generators 76.5 ms 75.9 ms: 1.01x faster
go 138 ms 134 ms: 1.03x faster
gunicorn 1.07 ms 1.06 ms: 1.00x faster
json 4.62 ms 4.69 ms: 1.01x slower
json_dumps 9.32 ms 9.57 ms: 1.03x slower
json_loads 24.5 us 24.2 us: 1.01x faster
logging_format 6.40 us 6.31 us: 1.01x faster
logging_silent 92.8 ns 91.7 ns: 1.01x faster
logging_simple 5.76 us 5.80 us: 1.01x slower
mako 9.80 ms 9.70 ms: 1.01x faster
mdp 2.69 sec 2.51 sec: 1.07x faster
pathlib 17.7 ms 17.9 ms: 1.01x slower
pickle 10.1 us 10.2 us: 1.02x slower
pickle_dict 30.9 us 32.4 us: 1.05x slower
pickle_list 4.12 us 4.29 us: 1.04x slower
pickle_pure_python 286 us 288 us: 1.01x slower
pidigits 189 ms 190 ms: 1.00x slower
pycparser 1.15 sec 1.09 sec: 1.05x faster
pyflate 402 ms 400 ms: 1.01x faster
python_startup 8.98 ms 8.89 ms: 1.01x faster
python_startup_no_site 6.50 ms 6.44 ms: 1.01x faster
raytrace 281 ms 284 ms: 1.01x slower
regex_compile 127 ms 128 ms: 1.01x slower
regex_dna 210 ms 201 ms: 1.05x faster
regex_effbot 3.49 ms 3.42 ms: 1.02x faster
regex_v8 22.4 ms 21.3 ms: 1.05x faster
richards 41.7 ms 42.6 ms: 1.02x slower
scimark_fft 301 ms 303 ms: 1.01x slower
scimark_monte_carlo 65.6 ms 64.7 ms: 1.01x faster
scimark_sparse_mat_mult 3.96 ms 3.99 ms: 1.01x slower
spectral_norm 95.3 ms 96.2 ms: 1.01x slower
sqlglot_optimize 51.0 ms 50.5 ms: 1.01x faster
sqlglot_normalize 105 ms 103 ms: 1.02x faster
sympy_expand 453 ms 450 ms: 1.01x faster
sympy_integrate 19.7 ms 19.7 ms: 1.00x faster
sympy_sum 154 ms 155 ms: 1.00x slower
telco 6.26 ms 6.46 ms: 1.03x slower
thrift 737 us 748 us: 1.01x slower
tornado_http 93.6 ms 94.5 ms: 1.01x slower
unpack_sequence 46.7 ns 44.4 ns: 1.05x faster
unpickle 13.2 us 13.1 us: 1.01x faster
unpickle_pure_python 197 us 201 us: 1.02x slower
xml_etree_iterparse 109 ms 106 ms: 1.02x faster
xml_etree_process 54.1 ms 53.8 ms: 1.01x faster
Geometric mean (ref) 1.00x faster

Benchmark hidden because not significant (33): aiohttp, async_tree_none, async_tree_cpu_io_mixed, async_tree_io, chameleon, bench_mp_pool, coverage, crypto_pyaes, dask, deepcopy, deltablue, django_template, djangocms, dulwich_log, genshi_text, genshi_xml, hexiom, html5lib, meteor_contest, mypy, nbody, nqueens, pprint_safe_repr, pprint_pformat, scimark_lu, scimark_sor, sqlglot_parse, sqlglot_transpile, sqlite_synth, sympy_str, unpickle_list, xml_etree_parse, xml_etree_generate

* main: (225 commits)
  pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078)
  pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012)
  pythongh-101566: Sync with zipp 3.14. (pythonGH-102018)
  pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589)
  pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161)
  pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074)
  pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912)
  pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949)
  pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070)
  pythongh-97786: Fix compiler warnings in pytime.c (python#101826)
  pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962)
  Misc improvements to the float tutorial (pythonGH-102052)
  pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046)
  pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854)
  Add missing 'is' to `cmath.log()` docstring (python#102049)
  pythongh-100210: Correct the comment link for unescaping HTML (python#100212)
  pythongh-97930: Also include subdirectory in makefile. (python#102030)
  pythongh-99735: Use required=True in argparse subparsers example (python#100927)
  Fix incorrectly documented attribute in csv docs (python#101250)
  pythonGH-84783: Make the slice object hashable (pythonGH-101264)
  ...
@carljm
Copy link
Member Author

carljm commented Mar 8, 2023

Closing this for now in favor of #101441

May reopen this approach if PEP 709 (implemented by that PR) is rejected.

@carljm carljm closed this Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inline dict/list/set comprehensions in the compiler for better performance
4 participants