Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable semantic interposition? #501

Closed
blopker opened this issue Jul 1, 2020 · 21 comments · Fixed by docker-library/official-images#11174
Closed

Disable semantic interposition? #501

blopker opened this issue Jul 1, 2020 · 21 comments · Fixed by docker-library/official-images#11174
Labels
Request Request for image modification or feature

Comments

@blopker
Copy link
Contributor

blopker commented Jul 1, 2020

Red Hat recently released an article about how they improved Python's performance by 30% in Python 3.8 by disabling semantic interposition. Article: https://developers.redhat.com/blog/2020/06/25/red-hat-enterprise-linux-8-2-brings-faster-python-3-8-run-speeds/

Looks like they added the -fno-semantic-interposition flag to GCC.

Would this option make sense for this image?

@wglambert wglambert added the Request Request for image modification or feature label Jul 1, 2020
@blopker
Copy link
Contributor Author

blopker commented Jul 1, 2020

This Fedora wiki has some benchmarks from the change: https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup

@tianon
Copy link
Member

tianon commented Jul 1, 2020

This is neat! Do you know if there's been any attempt to bring this suggestion upstream? (and/or any official recommendation from upstream to set this flag?)

@tianon
Copy link
Member

tianon commented Jul 1, 2020

It would appear that https://bugs.python.org/issue38980 is the appropriate upstream discussion. 👍

@blopker
Copy link
Contributor Author

blopker commented Jul 1, 2020

Nice. There's some good points in that python.org thread in favor of the option, like Clang disabling it by default already. I wonder if adding the option to the Docker images would help upstream make a decision?

@yosifkit
Copy link
Member

yosifkit commented Jul 1, 2020

adding the option to the Docker images would help upstream make a decision?

We follow their lead as to how they would like to Python built. We aspire to represent how Python maintainers desire Python to be built as purely as possible. When they reach a decision to change the default, we will definitely use it.

@blopker
Copy link
Contributor Author

blopker commented Jul 2, 2020

I think that's fair, being conservative here is important so this image doesn't break things for people. My hope was that the performance gains would make for an exception in this case.

Shot in the dark, but since @gpshead had some great input on #160, are you familiar with the risks here as well?

@gpshead
Copy link

gpshead commented Jul 2, 2020

i'll focus on the upstream issue. But the simple answer is that if you test this with your interpreter's build + configure setup and it makes a demonstrable improvement that you can measure. I'd go ahead and adopt it.

@blopker
Copy link
Contributor Author

blopker commented Jul 2, 2020

That's great, thank you for the feedback. You're right, we should do an actual test on this specific case.

I ran a quick benchmark. baseline is https://github.com/docker-library/python/blob/master/3.8/buster/Dockerfile with no changes. 3.8-interposition is baseline with just -fno-semantic-interposition. 3.8-interposition-lto is 3.8-interposition with also the --with-lto flag.

+-----------------+--------------+------------------------------+----------------------------+
| Benchmark       | 3.8-baseline | 3.8-interposition-lto        | 3.8-interposition          |
+=================+==============+==============================+============================+
| django_template | 108 ms       | 90.5 ms: 1.19x faster (-16%) | 104 ms: 1.04x faster (-4%) |
+-----------------+--------------+------------------------------+----------------------------+
| nbody           | 237 ms       | 210 ms: 1.13x faster (-11%)  | 227 ms: 1.04x faster (-4%) |
+-----------------+--------------+------------------------------+----------------------------+
| raytrace        | 895 ms       | 723 ms: 1.24x faster (-19%)  | 871 ms: 1.03x faster (-3%) |
+-----------------+--------------+------------------------------+----------------------------+

Not quite the numbers that Red Hat was seeing, but adding LTO does seem to have a significant impact.

@blopker
Copy link
Contributor Author

blopker commented Jul 3, 2020

Getting similar results with Alpine 3.12 with LTO and -fno-semantic-interposition:

+-----------------+--------+------------------------------+
| Benchmark       | alpine | alpine-optimized             |
+=================+========+==============================+
| django_template | 110 ms | 95.9 ms: 1.15x faster (-13%) |
+-----------------+--------+------------------------------+
| nbody           | 242 ms | 207 ms: 1.17x faster (-14%)  |
+-----------------+--------+------------------------------+
| raytrace        | 914 ms | 717 ms: 1.27x faster (-22%)  |
+-----------------+--------+------------------------------+

@tianon
Copy link
Member

tianon commented Jul 6, 2020

Interesting results, although I'm not sure this is very convincing WRT -fno-semantic-interposition being worth specifying explicitly/manually -- it seems to me those results point more to LTO making a fairly large difference, and -fno-semantic-interposition making a very small difference. Would you be willing to perform your tests again with just LTO and without -fno-semantic-interposition to ensure we have a "control" to compare the LTO+-fno-semantic-interposition results against?

Additionally, I want to make sure we're careful with the build-time implications of this, which is why we resisted --enable-optimizations for so long (see conversation in #160).

@tianon
Copy link
Member

tianon commented Jul 6, 2020

Additionally, I want to make sure we're careful with the build-time implications of this, which is why we resisted --enable-optimizations for so long (see conversation in #160).

To illustrate better what I mean, the build of python:3.9-buster in #502 appears to have taken roughly 14 minutes -- the same build on master took roughly 8 minutes. That ~6 minute difference doesn't seem like much until you multiply it by ~30 supported variations, at which point it becomes a roughly 2.5 - 3 hour increase in total build time (and on the "official" build servers, the builds do not happen in parallel, so this really is pretty heavy for only a ~10% speed increase).

@blopker
Copy link
Contributor Author

blopker commented Jul 6, 2020

Fair point. I actually did do an LTO-only test when I did the other tests. I don't have the results now, but they were interesting. Mostly no changes in performance, some tests were even slower. I believe LTO only works when function calls don't go through the lookup table, like when the -fno-semantic-interposition is set. In the Fedora wiki they don't mention this since they already had --with-lto enabled and likely didn't test it.

I remember also testing --with-lto in #160 and didn't propose we add it because it didn't seem to do anything.

@blopker
Copy link
Contributor Author

blopker commented Jul 19, 2020

OK, I took some time to run more tests: https://gist.github.com/blopker/87153749077fc2ca233310627a564e42
These results are preliminary. I'm currently rerunning the tests with the --rigorous flag since some of the faster tests showed pretty inconsistent results. However, it looks like the performance improvements are as high as ~30%. Tests that rely more on calling C extensions show the least changes. The more Python-heavy tests show the most improvement.

@blopker
Copy link
Contributor Author

blopker commented Jul 20, 2020

Ok, the results with rigorous didn't change too much, there was still a lot of variance with the quick tests. I think the only way to get those more consistent is to get a dedicated host to run on, but that's more $$.

Seems like most of the performance gains are more than 10%. Is there any other information that would be useful?

@blopker
Copy link
Contributor Author

blopker commented Dec 23, 2020

Some good news on this! Looks like -fno-semantic-interposition was added to --enable-optimizations in Python 3.10 (https://docs.python.org/3.10/whatsnew/3.10.html#optimizations). Thanks @gpshead for raising that upstream.

To get the full performance benefits we'll still need to add --with-lto though.

@johnthagen
Copy link

johnthagen commented Sep 14, 2021

With Python 3.10 soon to release, to get this performance improvement should we use the 3.10 tag or is the plan that these compiler settings would also get backported to older Python images? The Python 3.10 docs state up to a 30% speed improvement.

@johnthagen
Copy link

johnthagen commented Oct 17, 2021

Was this optimization included in the python:3.10 image?

This line would imply it was?

--enable-optimizations \

@blopker
Copy link
Contributor Author

blopker commented Oct 18, 2021

These are the results from adding --with-lto to that bullseye Dockerfile on line 46 and comparing it to the original Dockerfile:

+-----------------+-----------+---------------+--------------+-----------------------+
| Benchmark       | 3-10.json | 3-10-lto.json | Change       | Significance          |
+=================+===========+===============+==============+=======================+
| django_template | 94.3 ms   | 77.2 ms       | 1.22x faster | Significant (t=85.87) |
+-----------------+-----------+---------------+--------------+-----------------------+
| nbody           | 222 ms    | 193 ms        | 1.15x faster | Significant (t=29.15) |
+-----------------+-----------+---------------+--------------+-----------------------+
| raytrace        | 835 ms    | 650 ms        | 1.28x faster | Significant (t=95.29) |
+-----------------+-----------+---------------+--------------+-----------------------+

From this it looks like --enable-optimizations does not include --with-lto and the LTO flag still needs to be added to the Dockerfile to take full advantage of the new 3.10 compiler options. Thanks @johnthagen for bringing this up. @gpshead or @tianon, are there any more concerns about adding this flag in? Thanks!

@gpshead
Copy link

gpshead commented Oct 18, 2021

Debian's been building their own Python distro package using LTO since forever. You can apt-get source python3.9 on bullseye and look at their debian/rules file.

But they don't appear to be using the builtin Python's configure --enable-optimizations or --with-lto flags to do it, they use make profile-opt and construct their own set of LTO flags themselves, passing them in via EXTRA_CFLAGS=.

Along with another step using sed that appears to strip such flags from the generated sysconfig module to avoid issues with that?

@blopker
Copy link
Contributor Author

blopker commented Oct 18, 2021

Interesting find. It looks like the patches are also applied to Python 2.7. Looking for the patch (link-opt.diff) on Google, it seems to predate the with-lto flag in Python, which was only added in 2016. I see references to the patch from at least 2015. In any case, that seems like a good signal that enabling LTO is safe.

@blopker
Copy link
Contributor Author

blopker commented Oct 19, 2021

OK, I created a PR to only apply --with-lto to 3.10+: #660
Unfortunately, build times for those images do increase. I do think the performance benefits are worth it, but I'm not the one maintaining the images. I can run more benchmarks, however I'm not sure if we'll find anything new there.

@tianon tianon closed this as completed in 1fa00a0 Oct 19, 2021
tianon added a commit that referenced this issue Oct 19, 2021
Fix #501: add --with-lto to 3.10 and newer
docker-library-bot added a commit to docker-library-bot/official-images that referenced this issue Oct 19, 2021
Changes:

- docker-library/python@70b2802: Merge pull request docker-library/python#660 from blopker/feature/with-lto
- docker-library/python@1fa00a0: Fix docker-library/python#501: add --with-lto to 3.10 and newer
- docker-library/python@35fa20f: Update to 3.8.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@5665256: Update to 3.6.15, setuptools 57.5.0, pip 21.2.4
- docker-library/python@b6b534f: Update to 3.7.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@62ec6a1: Update to 3.11.0a1, setuptools 57.5.0, pip 21.2.4
- docker-library/python@1c48a50: Update to 3.9.7, setuptools 57.5.0, pip 21.2.4
- docker-library/python@652ed14: Update to 3.10.0, setuptools 57.5.0, pip 21.2.4
docker-library-bot added a commit to docker-library-bot/official-images that referenced this issue Oct 23, 2021
Changes:

- docker-library/python@9242c44: Update to 3.8.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@59a27ea: Update to 3.6.15, setuptools 57.5.0, pip 21.2.4
- docker-library/python@a765583: Update to 3.7.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@1f16f24: Update to 3.11.0a1, setuptools 57.5.0, pip 21.2.4
- docker-library/python@a7ff0b7: Update to 3.9.7, setuptools 57.5.0, pip 21.2.4
- docker-library/python@8e59156: Update to 3.10.0, setuptools 57.5.0, pip 21.2.4
- docker-library/python@70b2802: Merge pull request docker-library/python#660 from blopker/feature/with-lto
- docker-library/python@1fa00a0: Fix docker-library/python#501: add --with-lto to 3.10 and newer
- docker-library/python@35fa20f: Update to 3.8.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@5665256: Update to 3.6.15, setuptools 57.5.0, pip 21.2.4
- docker-library/python@b6b534f: Update to 3.7.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@62ec6a1: Update to 3.11.0a1, setuptools 57.5.0, pip 21.2.4
- docker-library/python@1c48a50: Update to 3.9.7, setuptools 57.5.0, pip 21.2.4
- docker-library/python@652ed14: Update to 3.10.0, setuptools 57.5.0, pip 21.2.4
NeilHanlon pushed a commit to NeilHanlon/official-images that referenced this issue Oct 27, 2021
Changes:

- docker-library/python@9242c44: Update to 3.8.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@59a27ea: Update to 3.6.15, setuptools 57.5.0, pip 21.2.4
- docker-library/python@a765583: Update to 3.7.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@1f16f24: Update to 3.11.0a1, setuptools 57.5.0, pip 21.2.4
- docker-library/python@a7ff0b7: Update to 3.9.7, setuptools 57.5.0, pip 21.2.4
- docker-library/python@8e59156: Update to 3.10.0, setuptools 57.5.0, pip 21.2.4
- docker-library/python@70b2802: Merge pull request docker-library/python#660 from blopker/feature/with-lto
- docker-library/python@1fa00a0: Fix docker-library/python#501: add --with-lto to 3.10 and newer
- docker-library/python@35fa20f: Update to 3.8.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@5665256: Update to 3.6.15, setuptools 57.5.0, pip 21.2.4
- docker-library/python@b6b534f: Update to 3.7.12, setuptools 57.5.0, pip 21.2.4
- docker-library/python@62ec6a1: Update to 3.11.0a1, setuptools 57.5.0, pip 21.2.4
- docker-library/python@1c48a50: Update to 3.9.7, setuptools 57.5.0, pip 21.2.4
- docker-library/python@652ed14: Update to 3.10.0, setuptools 57.5.0, pip 21.2.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Request Request for image modification or feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants