Add benchmark suite to CI #15696

kocsismate · 2024-09-01T19:44:10Z

This PR integrates https://github.com/kocsismate/php-version-benchmarks/ into the CI as a nightly job running every day at 12:30 AM. Roughly, the following happens: the benchmark suite spins up an AWS EC2 instance via Terraform, runs the tests according to the configuration, and then the results are committed to the https://github.com/kocsismate/php-version-benchmark-results repository.

In order to have as stable results as possible, the CPU, kernel and other settings of the AWS instance are fine-tuned:

Hyper-threading is disabled
Turbo boost is disabled
C states of the CPU are limited: https://docs.aws.amazon.com/linux/al2/ug/processor_state_control.html#baseline-perf
The workload is dedicated to a single core by using taskset according to Intel's recommendations (https://web.archive.org/web/20210614053522/https://01.org/node/3774)
An io2 SSD volume is attached to the instance which has a provisioned IOPS (https://docs.aws.amazon.com/ebs/latest/userguide/provisioned-iops.html#io2-block-express) so that IO performance is nearly constant
The instance is dedicated so that the noisy neighbor effect is eliminated: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/dedicated-instance.html
ASLR is disabled (Disable ASLR for benchmark #13769)

Customizing the CPU is only supported by metal instances among recent instance types according to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html, so at last, a c7i.metal-24xl instance is used in the eu-west-1 region.

The benchmark suite compares the performance of the latest commit of the master branch in the time when the benchmark runs with the last commit of master from the day before yesterday. I.e. if the benchmark runs tomorrow morning at 12:30 AM, then the performance of the latest commit will be benchmarked against the last commit pushed yesterday. This makes it possible to spot outstanding regressions (or progressions) in time. Actually, the end goal is to send notifications in case of any significant changes for further analyzation. The reason why the benchmark is run for previous commits as well (while they may have already been measured the day before) is to make the results less sensitive for changes in the environment or the benchmark suite itself. I.e.: if AWS upgrades the OS, or if the code under test is modified, then the numbers will likely be affected, and the previous results will be invalidated).

According to the first two test runs on a metal instance, the following results were measured (without using the dedicated instance feature for now):

As it can be seen, the difference between the median results of the two runs have less than 0.001 sec difference in case of most tests. This is very promising as it suggests that the environment is stable and the results are precise. To have a better understanding of the actual accuracy of the results, I ran the benchmark comparing the very same commits to each other in the hope that the performance diff would be very close to 0. Here are the results:

php/real-time-benchmark-data@7a1ff15?short_path=a9d6090#diff-a9d60905b6b553a9889176ea93d928c9d9970d120c9dd146f7035d1dc73c3690

Indeed, real word tests show a maximum of relative difference of 0.08%, while the synthetic ones vary by a maximum of 0.35%.

kocsismate · 2024-09-04T07:43:51Z

.github/workflows/benchmark.yml

+          sed -i 's/secret_key = ""/secret_key = "${{ secrets.PHP_VERSION_BENCHMARK_AWS_SECRET_KEY }}"/g' ./php-version-benchmarks/build/infrastructure/config/aws.tfvars
+
+          cp ./php-version-benchmarks/config/php/master.ini.dist ./php-version-benchmarks/config/php/master1.ini
+          YESTERDAY="$(date -d "-2 day 13:00" '+%Y-%m-%d')"


Intel's benchmarks years ago used a single baseline commit: https://externals.io/message/100532 They used a specific PHP 7.0 commit. I'm not sure what the best would be for us, but it may be useful to have a clear understanding about the performance of a PHP version since its development started (e.g. the baseline could be the first commit after the latest minor PHP version was branched|)

@arnaud-lb What do you think about this possibility? Should we add a specific commit (i.e. the first commit after PHP 8.4 is branched) to the benchmark? This way, we would compare the latest commit with the baseline and yesterday's commit.

I've just added it, because I think it will come handy if we also have a single fixed commit against which we can compare changes.

arnaud-lb

Great work!

I think that comparing two commits every run is a good idea, for the reasons you specified. Do you plan to also export some visualizations? It would be nice to have a visualization of the delta between the two commits, and another one with the absolute result of master. Could this be integrated with https://nielsdos.github.io/php-benchmark-visualisation/? (cc @nielsdos)

.github/workflows/benchmark.yml

arnaud-lb · 2024-09-04T11:47:34Z

.github/workflows/benchmark.yml

+        run: |
+          set -e
+
+          cp -r "php-version-benchmarks/tmp/php_master/" "php-version-benchmarks/tmp/php_master_jit"


You may want to preserve attributes such as timestamps and permissions with -a.
git clone "php-version-benchmarks/tmp/php_master/" "php-version-benchmarks/tmp/php_master_jit" would work, too.

Thanks for the info, I didn't know this is possible. Even though I cannot think of a situation when these attributes would be important for the benchmark, I'm fine with the change as it only improves things.

arnaud-lb · 2024-09-04T12:06:00Z

.github/workflows/benchmark.yml

+
+          cp ./php-version-benchmarks/config/php/master.ini.dist ./php-version-benchmarks/config/php/master1.ini
+          YESTERDAY="$(date -d "-2 day 13:00" '+%Y-%m-%d')"
+          YESTERDAY_SHA="$(cd ./php-version-benchmarks/tmp/php_master/ && git --no-pager log --until="$YESTERDAY 23:59:59" -n 1 --pretty='%H')"


A potential issue is that if this particular commit is broken or if the job is temporarily broken, we will lose some data points. E.g. if we have this:

- commit3 // master - commit2 - commit1 // YESTERDAY_SHA

If commit3 is broken, and commit2 has a performance regression, we might not notice it.

Could we store the commit hash of the last successful benchmark in kocsismate/php-version-benchmark-results, and use that as a base?

This would also make it less likely to benchmark a commit in the middle of a merged branch.

Interesting idea, and I also agree with the concern as well as the suggestion.

Another related concern is that it is currently not possible to automatically detect issues with the tests. Of course, totally broken PHP versions won't be able to run any tests, so I guess the benchmark will fail. However, if there are only smaller issues (e.g. warnings, notices or there are other kind of error) then it's possible that a different code is executed than otherwise, making the results unreliable again.

Fortunately, I've already added support for manual examination of the test outputs because I printed the response of the first request to the stdout, so one could have a look at them (I usually do a brief check). This should be extended with automatic checks for error messages, and if any is found, the benchmark should be stopped.

(tangentially related topic: I can just really hope that we will be able to use the benchmark during the development phase of PHP 9.0)

The benchmark now verifies the test outputs: kocsismate/php-version-benchmarks@142e907 So the only missing part is to parse the commit hash of the last successful build (which can be retrieved from the database.tsv file).

(side note: I guess I'll also have to make the database store the results per yer too, just in case)

The per year database storage is also implemented: kocsismate/php-version-benchmarks@0e84b9e

.github/workflows/benchmark.yml

arnaud-lb · 2024-09-04T12:17:42Z

.github/workflows/benchmark.yml

+            echo "Merging, can't proceed"
+            exit 1
+          fi
+          git add .


This is creating one directory in the root every day. I'm not sure, but could this become an issue after some time due to the amount of entries in the root?

Interesting observation! I can agree with you, this is at least a threat in the future. Do you think it's enough to put the results into a single "year" directory? This way a maximum of only ~365 files would be in a directory, assuming that we will continue to run the benchmark once per day. I could also imagine a year/month/result directory structure if we want to really future proof the structure....

Yes this seems reasonable and probably enough

Implemented the suggestion: kocsismate/php-version-benchmarks@cc0090b

.github/workflows/benchmark.yml

nielsdos · 2024-09-05T20:44:17Z

@arnaud-lb I can integrate it into the visualisation website once merged. I'm a bit busy with real life atm though so may take some time.

kocsismate · 2024-09-10T18:17:07Z

I added support for gathering the instruction count via Valgrind (kocsismate/php-version-benchmarks@f55c32d), similarly how Ilija's benchmark does it, but the result didn't correlate to the exact wall-time performance indeed :( So I made this metric disabled by default (kocsismate/php-version-benchmarks@3f31401).

kocsismate · 2024-09-11T08:40:42Z

I ran two benchmarks after my review fixes: php/real-time-benchmark-data@cbf7425?short_path=44a0fb3#diff-44a0fb3a1464ed3a62f884a3b2ad83ca08f7c97f3c37e7e3f497d4cf1b249e4c and php/real-time-benchmark-data@bcc0143?short_path=e15a4d6#diff-e15a4d62fc9d8b702c8346c9aa07e4f827b1eea540e21e4785ea4cd77acebe78

The first time, the last commit the day before yesterday was the baseline because the database was missing. The second time, the previously benchmarked commit was compared to Niels' latest commit :)

The commit is now the latest one. This should be changed later for the one after PHP 8.4 is branched.

kocsismate · 2024-09-16T07:41:52Z

@iluuu1994 Can I ask you to have a look at this PR?

kocsismate · 2024-09-24T10:57:42Z

Unless there's any concern, I plan to merge this after branching happened

iluuu1994 · 2024-09-24T11:08:29Z

@kocsismate Sorry, I didn't have much time. You should definitely move the data repo to the php org before merging. I don't have much else to say, this is completely isolated and shouldn't cause issues for anything else.

iluuu1994 · 2024-09-24T11:11:40Z

I would also be happy if we could distinguish the terminology for the two benchmarks somehow. I don't mind renaming the existing benchmark too.

cmb69 · 2024-09-24T11:26:26Z

For what it's worth, until a couple of years ago the PHP on Windows QA team did performance tests for all releases (QA and GA) in a controlled environment. Unfortunately, that had been shut down, so it's nice to have a similar performance testing available again (and for PHP, Linux is more relevant than Windows anyway).

kocsismate · 2024-09-24T12:46:10Z

I would also be happy if we could distinguish the terminology for the two benchmarks somehow. I don't mind renaming the existing benchmark too.

Yeah, I also tried to come up with some sensible naming, but I couldn't decide.. But would you think using "Valgrind benchmark" for yours and "Performance benchmark" for mine make sense?

iluuu1994 · 2024-09-24T13:24:38Z

"Valgrind" or "instruction count" for the existing, and "real-time" would make most sense to me.

kocsismate · 2024-09-24T14:18:55Z

"Valgrind" or "instruction count" for the existing, and "real-time" would make most sense to me.

I'd go with "valgrind" because it's much shorter than "instruction count", but I'd say it's your call to choose the name. Personally, I'm fine with "real-time" for the new benchmark.

cmb69 · 2024-09-24T14:37:40Z

Maybe call it iCount (unless that is already a trademark). ;)

iluuu1994 · 2024-09-24T14:45:57Z

Not that I don't appreciate the humor, but I think we should stick with something more standard 😄 If the intention is to shorten it, many tools just call it IC.

cmb69 · 2024-09-24T15:24:32Z

That was tongue-in-cheek. I don't see why "instruction count" would be too long, anyway.

kocsismate · 2024-09-24T20:03:46Z

I don't see why "instruction count" would be too long, anyway.

"Instruction count benchmark" sounds a bit long, while "Valgrind benchmark" is easier to say, at least for me. But I really don't mind which name it gets.

kocsismate · 2024-09-24T20:17:57Z

You should definitely move the data repo to the php org before merging.

@iluuu1994 I don't have enough permissions to transfer it. Could you please grant me privilege to create repositories in the php org?

kocsismate force-pushed the ci-benchmark branch 22 times, most recently from ad7192a to 49fedd5 Compare September 3, 2024 20:03

Add version benchmark to CI

4e0559a

kocsismate force-pushed the ci-benchmark branch from 49fedd5 to 4e0559a Compare September 3, 2024 20:03

kocsismate marked this pull request as ready for review September 3, 2024 20:03

kocsismate requested review from iluuu1994 and TimWolla as code owners September 3, 2024 20:03

kocsismate requested a review from arnaud-lb September 3, 2024 20:04

kocsismate changed the title ~~Add version benchmark to CI~~ Add experimental benchmark to CI Sep 3, 2024

kocsismate requested a review from dstogov September 4, 2024 06:55

kocsismate changed the title ~~Add experimental benchmark to CI~~ Add benchmark suite to CI Sep 4, 2024

kocsismate commented Sep 4, 2024

View reviewed changes

arnaud-lb reviewed Sep 4, 2024

View reviewed changes

Review fixes

efd87aa

kocsismate force-pushed the ci-benchmark branch 2 times, most recently from a8819e8 to a996c8d Compare September 12, 2024 09:15

Measure results for a baseline PHP version

23fcb24

The commit is now the latest one. This should be changed later for the one after PHP 8.4 is branched.

kocsismate force-pushed the ci-benchmark branch from a996c8d to 23fcb24 Compare September 12, 2024 11:23

TimWolla removed their request for review September 15, 2024 13:18

kocsismate requested a review from arnaud-lb September 16, 2024 06:42

kocsismate added 2 commits September 24, 2024 22:21

Update baseline commit SHA + result repo name

24fcc0c

Rename benchmark results repo after transfer

dd0cd0e

kocsismate merged commit 2448a01 into php:master Sep 25, 2024
10 checks passed

kocsismate deleted the ci-benchmark branch September 25, 2024 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark suite to CI #15696

Add benchmark suite to CI #15696

kocsismate commented Sep 1, 2024 •

edited

Loading

kocsismate Sep 4, 2024 •

edited

Loading

kocsismate Sep 11, 2024

kocsismate Sep 12, 2024

arnaud-lb left a comment

arnaud-lb Sep 4, 2024 •

edited

Loading

kocsismate Sep 5, 2024

arnaud-lb Sep 4, 2024

kocsismate Sep 5, 2024 •

edited

Loading

kocsismate Sep 10, 2024

kocsismate Sep 11, 2024

arnaud-lb Sep 4, 2024

kocsismate Sep 5, 2024

arnaud-lb Sep 5, 2024

kocsismate Sep 10, 2024

nielsdos commented Sep 5, 2024

kocsismate commented Sep 10, 2024

kocsismate commented Sep 11, 2024 •

edited

Loading

kocsismate commented Sep 16, 2024

kocsismate commented Sep 24, 2024

iluuu1994 commented Sep 24, 2024

iluuu1994 commented Sep 24, 2024

cmb69 commented Sep 24, 2024

kocsismate commented Sep 24, 2024

iluuu1994 commented Sep 24, 2024

kocsismate commented Sep 24, 2024

cmb69 commented Sep 24, 2024

iluuu1994 commented Sep 24, 2024

cmb69 commented Sep 24, 2024

kocsismate commented Sep 24, 2024

kocsismate commented Sep 24, 2024 •

edited

Loading

Add benchmark suite to CI #15696

Add benchmark suite to CI #15696

Conversation

kocsismate commented Sep 1, 2024 • edited Loading

kocsismate Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arnaud-lb left a comment

Choose a reason for hiding this comment

arnaud-lb Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kocsismate Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsdos commented Sep 5, 2024

kocsismate commented Sep 10, 2024

kocsismate commented Sep 11, 2024 • edited Loading

kocsismate commented Sep 16, 2024

kocsismate commented Sep 24, 2024

iluuu1994 commented Sep 24, 2024

iluuu1994 commented Sep 24, 2024

cmb69 commented Sep 24, 2024

kocsismate commented Sep 24, 2024

iluuu1994 commented Sep 24, 2024

kocsismate commented Sep 24, 2024

cmb69 commented Sep 24, 2024

iluuu1994 commented Sep 24, 2024

cmb69 commented Sep 24, 2024

kocsismate commented Sep 24, 2024

kocsismate commented Sep 24, 2024 • edited Loading

kocsismate commented Sep 1, 2024 •

edited

Loading

kocsismate Sep 4, 2024 •

edited

Loading

arnaud-lb Sep 4, 2024 •

edited

Loading

kocsismate Sep 5, 2024 •

edited

Loading

kocsismate commented Sep 11, 2024 •

edited

Loading

kocsismate commented Sep 24, 2024 •

edited

Loading