Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increases benchmark warm-up and decreases the threshold value #268

Merged
merged 41 commits into from
Jun 9, 2023

Conversation

cheqianh
Copy link
Contributor

@cheqianh cheqianh commented May 16, 2023

This PR
(1) Addressed a memory leak issue - link
(2) Experimented the GHA runners' resource limitation and switched to MacOS for 1k iterations and warmups - link
(3) Added a method for Ion binary/text conversion - link
(4) Fixed a comparison result mismatch issue by sorting the combination list - link

The threshold is set to 0.2 for now, we will work on its enhancement in the future. The latest performance detection failed due to the variance.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@cheqianh cheqianh requested a review from tgregg May 16, 2023 22:18
@cheqianh cheqianh marked this pull request as ready for review May 16, 2023 22:18
@tgregg
Copy link
Contributor

tgregg commented May 16, 2023

A threshold of 0.5 still allows a 50% performance regression, right? That's still not going to be low enough for our needs. How did you select 150 for warmups/iterations? Are you seeing diminishing returns in terms of reduced variance as you increase the warmups/iterations, or does the regression detection workflow just start taking too long?

@cheqianh
Copy link
Contributor Author

cheqianh commented May 17, 2023

If 0.5 doesn't satisfy our needs, we need to consider other improvements as well.

There is an error when running 200+ warm-ups - actions/runner-images#6680.

The Github runner only allocate fixed resource (e.g., memory) to jobs. Running over 200 times make the job cancel automatically. Additionally the results of relative-difference are usually < 0.3 but sometimes one sample file (real_worlds_data_1) fail with 0.5+ perf regressions (and if the workflow fails, it's always this file).

One possible solution to enable more iteration and warmups is using a larger runner - https://docs.github.com/en/actions/using-github-hosted-runners/using-larger-runners

@cheqianh
Copy link
Contributor Author

cheqianh commented May 18, 2023

Found a small memory leak in each simpleion.dumps()/loads() call, which might be the root to cause the workflow exceeds the resource limit. Investigating the root.

@cheqianh
Copy link
Contributor Author

cheqianh commented May 21, 2023

Confirmed that execution_with_command(['read', file, '--api', 'load_dump', '--api', 'streaming']) has memory leak issues for simpleion load API when reading testStructs.10n. Didn't see any memory leak for dump API.

@cheqianh
Copy link
Contributor Author

cheqianh commented May 22, 2023

Fixed the memory leak for read and it worked - 9858b44

but still failed for large execution number. Investigating.

@@ -12,7 +12,7 @@ jobs:
name: Detect Regression
needs: PR-Content-Check
if: ${{ needs.PR-Content-Check.outputs.result == 'pass' }}
runs-on: ubuntu-latest
runs-on: macos-latest # ubuntu-latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should only use MacOS runners if we need to test specifically that something works correctly on MacOS or we're distributing a MacOS-specific binary.

What's the rationale for running on MacOS here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to run more warm-ups and iterations for the benchmark-cli command in order to generate a consistent threshold value to help us identify the performance regression. However, command with 150+ executions will hit the resource limitation of the GHA linux runner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which resource limit? Is it causing the job to timeout?

Copy link
Contributor Author

@cheqianh cheqianh Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the workflows run into this issue - actions/runner-images#6680

Copy link
Contributor Author

@cheqianh cheqianh Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found another issue that causes the workflow to generate a wrong threshold value. After fixing that issue, the threshold value seems to be more stable. But ideally we will eventually benchmark performance on all popular platforms.

@cheqianh
Copy link
Contributor Author

cheqianh commented Jun 7, 2023

This PR
(1) Addressed a memory leak issue - link
(2) Experimented the GHA runners' resource limitation and switched to MacOS for 1k iterations and warmups - link
(3) Added a method for Ion binary/text conversion - link
(4) Fixed a comparison result mismatch issue by sorting the combination list - link

The threshold is set to 0.2 for now, we will work on its enhancement in the future. An example performance detection failed due to variance.

@cheqianh
Copy link
Contributor Author

cheqianh commented Jun 9, 2023

I changed the target commit to the main branch so we don't have to create a new PR for that. In addition, I

@cheqianh cheqianh merged commit 35f07ca into amazon-ion:master Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants