gh-97933: add opcode for more efficient comprehension execution #101310

carljm · 2023-01-25T00:53:36Z

This avoids allocating a throwaway single-use function object every time we run a comprehension. Otherwise it shouldn't have any user-visible impact; the comprehension is still a separate code object and runs in its own frame, just as before. Tracebacks look the same, etc. We just have a new COMPREHENSION opcode that builds a frame directly from code
object and optional closure and inline-calls it, without creating a function object.

In a micro-benchmark of comprehension execution time, this looks like it saves about 25%:

On inlinecomp (e4a68550426dbea79bd9fc6ff0a75395891d35b7):

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]'
.....................
Mean +- std dev: 243 ns +- 14 ns

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]'
.....................
Mean +- std dev: 240 ns +- 11 ns

On main (f02fa64bf2d03ef7a28650c164e17a5fb5d8543d):

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]'
.....................
Mean +- std dev: 324 ns +- 11 ns

➜ ./python -m pyperf timeit -s 'l = [1, 2, 3, 4, 5]' '[x for x in l]'
.....................
Mean +- std dev: 329 ns +- 18 ns

Currently this doesn't handle async comprehensions or generator expressions; those still create a function object and CALL it. In principle I think they could be handled as well, but to keep the PR smaller I'll defer that to a second PR if this one is merged.

Closes #97933.

Issue: Inline dict/list/set comprehensions in the compiler for better performance #97933

carljm · 2023-01-25T00:57:16Z

@markshannon or @iritkatriel (or anyone else from faster-cpython team) -- if you could kick off a PyPerformance run with your fancy Github Action runner, I'd be much obliged!

iritkatriel · 2023-01-25T10:59:28Z

@markshannon or @iritkatriel (or anyone else from faster-cpython team) -- if you could kick off a PyPerformance run with your fancy Github Action runner, I'd be much obliged!

Done.

iritkatriel · 2023-01-25T14:53:26Z

Benchmark results are here: https://github.com/faster-cpython/benchmarking/blob/main/results/bm-20230125-3.12.0a4+-ee2ad56/bm-20230125-linux-x86_64-carljm-inlinecomp-3.12.0a4+-ee2ad56-vs-base.md

(shows no difference overall).

carljm · 2023-01-25T16:51:40Z

I can't see the detailed results (private repo), but if there's no impact overall then I guess they aren't that interesting :) We must not have any benchmarks that make heavy use of comprehensions.

markshannon · 2023-01-25T16:54:06Z

All benchmarks:

Benchmark	bm-20230124-linux-x86_64-python-f02fa64bf2d03ef7a286-3.12.0a4+-f02fa64	bm-20230125-linux-x86_64-carljm-inlinecomp-3.12.0a4+-ee2ad56
2to3	251 ms	249 ms: 1.01x faster
async_generators	349 ms	355 ms: 1.02x slower
async_tree_memoization	624 ms	656 ms: 1.05x slower
asyncio_tcp	488 ms	492 ms: 1.01x slower
chaos	66.2 ms	64.6 ms: 1.02x faster
bench_thread_pool	775 us	781 us: 1.01x slower
coroutines	24.8 ms	25.6 ms: 1.03x slower
deepcopy_reduce	2.91 us	2.96 us: 1.02x slower
deepcopy_memo	33.8 us	34.6 us: 1.02x slower
docutils	2.55 sec	2.51 sec: 1.02x faster
fannkuch	373 ms	369 ms: 1.01x faster
float	73.2 ms	72.0 ms: 1.02x faster
create_gc_cycles	1.47 ms	1.44 ms: 1.02x faster
gc_traversal	4.30 ms	3.64 ms: 1.18x faster
generators	76.5 ms	75.9 ms: 1.01x faster
go	138 ms	134 ms: 1.03x faster
gunicorn	1.07 ms	1.06 ms: 1.00x faster
json	4.62 ms	4.69 ms: 1.01x slower
json_dumps	9.32 ms	9.57 ms: 1.03x slower
json_loads	24.5 us	24.2 us: 1.01x faster
logging_format	6.40 us	6.31 us: 1.01x faster
logging_silent	92.8 ns	91.7 ns: 1.01x faster
logging_simple	5.76 us	5.80 us: 1.01x slower
mako	9.80 ms	9.70 ms: 1.01x faster
mdp	2.69 sec	2.51 sec: 1.07x faster
pathlib	17.7 ms	17.9 ms: 1.01x slower
pickle	10.1 us	10.2 us: 1.02x slower
pickle_dict	30.9 us	32.4 us: 1.05x slower
pickle_list	4.12 us	4.29 us: 1.04x slower
pickle_pure_python	286 us	288 us: 1.01x slower
pidigits	189 ms	190 ms: 1.00x slower
pycparser	1.15 sec	1.09 sec: 1.05x faster
pyflate	402 ms	400 ms: 1.01x faster
python_startup	8.98 ms	8.89 ms: 1.01x faster
python_startup_no_site	6.50 ms	6.44 ms: 1.01x faster
raytrace	281 ms	284 ms: 1.01x slower
regex_compile	127 ms	128 ms: 1.01x slower
regex_dna	210 ms	201 ms: 1.05x faster
regex_effbot	3.49 ms	3.42 ms: 1.02x faster
regex_v8	22.4 ms	21.3 ms: 1.05x faster
richards	41.7 ms	42.6 ms: 1.02x slower
scimark_fft	301 ms	303 ms: 1.01x slower
scimark_monte_carlo	65.6 ms	64.7 ms: 1.01x faster
scimark_sparse_mat_mult	3.96 ms	3.99 ms: 1.01x slower
spectral_norm	95.3 ms	96.2 ms: 1.01x slower
sqlglot_optimize	51.0 ms	50.5 ms: 1.01x faster
sqlglot_normalize	105 ms	103 ms: 1.02x faster
sympy_expand	453 ms	450 ms: 1.01x faster
sympy_integrate	19.7 ms	19.7 ms: 1.00x faster
sympy_sum	154 ms	155 ms: 1.00x slower
telco	6.26 ms	6.46 ms: 1.03x slower
thrift	737 us	748 us: 1.01x slower
tornado_http	93.6 ms	94.5 ms: 1.01x slower
unpack_sequence	46.7 ns	44.4 ns: 1.05x faster
unpickle	13.2 us	13.1 us: 1.01x faster
unpickle_pure_python	197 us	201 us: 1.02x slower
xml_etree_iterparse	109 ms	106 ms: 1.02x faster
xml_etree_process	54.1 ms	53.8 ms: 1.01x faster
Geometric mean	(ref)	1.00x faster

Benchmark hidden because not significant (33): aiohttp, async_tree_none, async_tree_cpu_io_mixed, async_tree_io, chameleon, bench_mp_pool, coverage, crypto_pyaes, dask, deepcopy, deltablue, django_template, djangocms, dulwich_log, genshi_text, genshi_xml, hexiom, html5lib, meteor_contest, mypy, nbody, nqueens, pprint_safe_repr, pprint_pformat, scimark_lu, scimark_sor, sqlglot_parse, sqlglot_transpile, sqlite_synth, sympy_str, unpickle_list, xml_etree_parse, xml_etree_generate

* main: (225 commits) pythongh-102056: Fix a few bugs in error handling of exception printing code (python#102078) pythongh-102011: use sys.exception() instead of sys.exc_info() in docs where possible (python#102012) pythongh-101566: Sync with zipp 3.14. (pythonGH-102018) pythonGH-99818: improve the documentation for zipfile.Path and Traversable (pythonGH-101589) pythongh-88233: zipfile: handle extras after a zip64 extra (pythonGH-96161) pythongh-101981: Apply HOMEBREW related environment variables (pythongh-102074) pythongh-101907: Stop using `_Py_OPCODE` and `_Py_OPARG` macros (pythonGH-101912) pythongh-101819: Adapt _io types to heap types, batch 1 (pythonGH-101949) pythongh-101981: Build macOS as recommended by the devguide (pythonGH-102070) pythongh-97786: Fix compiler warnings in pytime.c (python#101826) pythongh-101578: Amend PyErr_{Set,Get}RaisedException docs (python#101962) Misc improvements to the float tutorial (pythonGH-102052) pythongh-85417: Clarify behaviour on branch cuts in cmath module (python#102046) pythongh-100425: Update tutorial docs related to sum() accuracy (FH-101854) Add missing 'is' to `cmath.log()` docstring (python#102049) pythongh-100210: Correct the comment link for unescaping HTML (python#100212) pythongh-97930: Also include subdirectory in makefile. (python#102030) pythongh-99735: Use required=True in argparse subparsers example (python#100927) Fix incorrectly documented attribute in csv docs (python#101250) pythonGH-84783: Make the slice object hashable (pythonGH-101264) ...

carljm · 2023-03-08T22:51:20Z

Closing this for now in favor of #101441

May reopen this approach if PEP 709 (implemented by that PR) is rejected.

add opcode for more efficient comprehension execution

e4a6855

carljm requested review from markshannon and iritkatriel as code owners January 25, 2023 00:53

bedevere-bot added the awaiting review label Jan 25, 2023

bedevere-bot mentioned this pull request Jan 25, 2023

Inline dict/list/set comprehensions in the compiler for better performance #97933

Closed

📜🤖 Added by blurb_it.

ee2ad56

carljm added the performance Performance or resource usage label Jan 25, 2023

This was referenced Feb 14, 2023

Allow the f_func field of the _PyInterpreterFrame struct to be any object (and rename it) #96237

Closed

add comprehensions benchmark python/pyperformance#265

Merged

carljm closed this Mar 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-97933: add opcode for more efficient comprehension execution #101310

gh-97933: add opcode for more efficient comprehension execution #101310

carljm commented Jan 25, 2023 •

edited by bedevere-bot

Loading

carljm commented Jan 25, 2023

iritkatriel commented Jan 25, 2023

iritkatriel commented Jan 25, 2023

carljm commented Jan 25, 2023

markshannon commented Jan 25, 2023

carljm commented Mar 8, 2023

gh-97933: add opcode for more efficient comprehension execution #101310

gh-97933: add opcode for more efficient comprehension execution #101310

Conversation

carljm commented Jan 25, 2023 • edited by bedevere-bot Loading

carljm commented Jan 25, 2023

iritkatriel commented Jan 25, 2023

iritkatriel commented Jan 25, 2023

carljm commented Jan 25, 2023

markshannon commented Jan 25, 2023

All benchmarks:

carljm commented Mar 8, 2023

carljm commented Jan 25, 2023 •

edited by bedevere-bot

Loading