-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the CPU cores and Timeout for Arm64 E2E #15349
Conversation
Signed-off-by: Kevin Zhao <[email protected]>
d19c56b
to
1f18fcc
Compare
Those tests consistently pass on pull requests on amd64 architecture. Should we expect lower performance from arm64? Fact that we get lower performance from dedicated bare metal than to GitHub provided seems a little strange. But I don't have much experience with arm. cc @dims Would like to have confirmation that this is related to performance and not a bug. The log you provided is from your local run or one from our self-hosted workers? |
@serathius Geeta is looking into this right now. She's using https://github.com/etcd-io/etcd/tree/cb2a22e5d15e853e6456af47e22cb3d8f31b8301/tools/rw-heatmaps to see what sort of performance we can expect and if there are any toggles say in building the etcd artifacts we could turn on etc. |
Thanks, is this tracked in an issue? |
Hi, @serathius, the log I provided is our self-hosted workers, run with the TIMEOUT=60m, I can get the test finished. |
Actually I plan to propose to replace the existing rw-heatmaps with the new rw-benchmark. There is no change on the script rw-benchmark.sh, which means we will not change how we generate the data and the data format. But we will change how we will visualize the data. PTAL #15060. It's super easy to use. |
I assume it's put on hold - till investigation will reveal reason of the slowdown. Marking for now as 'draft'. |
@serathius @dims @ahrtr Hi any progress on the performance analysis? I'm glad to assist if any help needed. |
Best to ask Geeta about this. @geetasg |
sorry for late response. I need to rerun my test before publishing here - haven't been able to get to the rerun yet. @ahrtr is replacement for the heatmaps finalized? Do you recommend testing with https://github.com/ahrtr/etcd/tree/go_chart_20230101/tools/rw-benchmark . Do you have an example of the plots? I am getting 404 on https://github.com/ahrtr/etcd/blob/go_chart_20230101/tools/rw-benchmark/example/rw_benchmark.html |
Do not get time to drive that yet. Sorry.
I think using whichever tool is OK, and it isn't the point. Personally I recommend it because it looks clearer.
I removed the examples so as to make the PR less scary:) . I think it's super easy to generate a html file yourself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this pull request may no longer be required, with recent fixes implemented our e2e-arm64
suite has been running reliably for several weeks now: https://github.com/etcd-io/etcd/actions/workflows/e2e-arm64.yaml
Suggest we monitor for another week or so then if still ok we could maybe close this?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Discussed during sig etcd triage meeting, this can be closed as etcd ci infra has now changed. |
The E2E test on Arm64 already fail. https://github.com/etcd-io/etcd/actions/workflows/e2e-arm64.yaml
The root cause is that the it always trigger the 30m TIMEOUT.
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.