Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 670: LTO+PGO benchmark #2161

Merged
merged 1 commit into from
Nov 25, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 44 additions & 23 deletions pep-0670.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,8 @@ The following macros should not be converted:
or recent C features.
Example: ``#define Py_ALWAYS_INLINE __attribute__((always_inline))``.
* Macros that need the stringification or concatenation feature of the C preprocessor.
* Macros which can be used as l-value in an assignment. This change is
an incompatible change and it is out of the scope of this PEP.


Convert static inline functions to regular functions
Expand Down Expand Up @@ -225,6 +227,9 @@ Backwards Compatibility
Removing the return value of macros is an incompatible API change made
on purpose: see the `Remove the return value`_ section.

Macros which can be used as l-value in an assignment are not modified by
this PEP to avoid incompatible changes.


Rejected Ideas
==============
Expand All @@ -250,6 +255,21 @@ to miss a macro pitfall when writing and reviewing macro code. Moreover, macros
are harder to read and maintain than functions.


Examples of duplication of side effects
=======================================

Macros::

#define PySet_Check(ob) \
(Py_IS_TYPE(ob, &PySet_Type) \
|| PyType_IsSubtype(Py_TYPE(ob), &PySet_Type))

#define Py_IS_NAN(X) ((X) != (X))

If the *op* or the *X* argument has a side effect, the side effect is
duplicated: it executed twice by ``PySet_Check()`` and ``Py_IS_NAN()``.


Examples of hard to read macros
===============================

Expand Down Expand Up @@ -414,28 +434,12 @@ private static inline function has been added to the internal C API:
* ``_PyVectorcall_FunctionInline()``


Benchmarks
==========
Benchmark comparing macros and static inline functions
======================================================

Benchmarks run on Fedora 35 (Linux) with GCC 11 on a laptop with 8
Benchmark run on Fedora 35 (Linux) with GCC 11 on a laptop with 8
logical CPUs (4 physical CPU cores).


gcc -O0 versus gcc -Og
----------------------

Benchmark of the ``./python -m test -j10`` command on a Python debug
build:

* ``gcc -Og``: 220 sec ± 3 sec
* ``gcc -O0``: 360 sec ± 6 sec

Python built with ``gcc -O0`` is **1.6x slower** than Python built with
``gcc -Og``.

Replace macros with static inline functions
-------------------------------------------

The `PR 29728 <https://github.com/python/cpython/pull/29728>`_ replaces
existing the following static inline functions with macros:

Expand All @@ -449,11 +453,28 @@ existing the following static inline functions with macros:
* ``Py_NewRef()``
* ``Py_REFCNT()``, ``Py_TYPE()``, ``Py_SIZE()``

Benchmark of the ``./python -m test -j10`` command on a Python debug
build:

* Macros (PR 29728), ``gcc -O0``: 345 sec ± 5 sec
* Static inline functions (reference), ``gcc -O0``: 360 sec ± 6 sec
When static inline functions are inlined: Release build
-------------------------------------------------------

Benchmark of the ``./python -m test -j5`` command on Python built in
release mode with ``gcc -O3``, LTO and PGO:

* Macros (PR 29728): 361 sec +- 1 sec
* Static inline functions (reference): 361 sec +- 1 sec

There is **no significant performance difference** between macros and
static inline functions when static inline functions **are inlined**.


When static inline functions are not inlined: Debug build and -O0
-----------------------------------------------------------------

Benchmark of the ``./python -m test -j10`` command on Python built in
debug mode with ``gcc -O0`` (explicitly disable compiler optimizations):

* Macros (PR 29728): 345 sec ± 5 sec
* Static inline functions (reference): 360 sec ± 6 sec

Replacing macros with static inline functions makes Python
**1.04x slower** when the compiler **does not inline** static inline
Expand Down