Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase the CPU cores and Timeout for Arm64 E2E #15349

Closed
wants to merge 1 commit into from

Conversation

kevinzs2048
Copy link
Contributor

@kevinzs2048 kevinzs2048 commented Feb 22, 2023

The E2E test on Arm64 already fail. https://github.com/etcd-io/etcd/actions/workflows/e2e-arm64.yaml

The root cause is that the it always trigger the 30m TIMEOUT.

'e2e' started at Wed Feb 22 13:22:08 UTC 2023
% (cd tests && 'env' 'ETCD_VERIFY=all' 'go' 'test' '-timeout=60m' 'go.etcd.io/etcd/tests/v3/e2e')
ok  	go.etcd.io/etcd/tests/v3/e2e	656.811s
% (cd tests && 'env' 'ETCD_VERIFY=all' 'go' 'test' '--tags=e2e' '-timeout=60m' 'go.etcd.io/etcd/tests/v3/common')
ok  	go.etcd.io/etcd/tests/v3/common	1937.590s
'e2e' completed at Wed Feb 22 14:05:25 UTC 2023
SUCCESS

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@kevinzs2048 kevinzs2048 changed the title [WIP][No need Review]Increase the CPU cores and Timeout for Arm64 E2E Increase the CPU cores and Timeout for Arm64 E2E Feb 23, 2023
@serathius
Copy link
Member

Those tests consistently pass on pull requests on amd64 architecture. Should we expect lower performance from arm64? Fact that we get lower performance from dedicated bare metal than to GitHub provided seems a little strange. But I don't have much experience with arm. cc @dims

Would like to have confirmation that this is related to performance and not a bug. The log you provided is from your local run or one from our self-hosted workers?

@dims
Copy link
Contributor

dims commented Feb 26, 2023

@serathius Geeta is looking into this right now. She's using https://github.com/etcd-io/etcd/tree/cb2a22e5d15e853e6456af47e22cb3d8f31b8301/tools/rw-heatmaps to see what sort of performance we can expect and if there are any toggles say in building the etcd artifacts we could turn on etc.

@serathius
Copy link
Member

Thanks, is this tracked in an issue?

@kevinzs2048
Copy link
Contributor Author

Hi, @serathius, the log I provided is our self-hosted workers, run with the TIMEOUT=60m, I can get the test finished.

@ahrtr
Copy link
Member

ahrtr commented Feb 26, 2023

@serathius Geeta is looking into this right now. She's using https://github.com/etcd-io/etcd/tree/cb2a22e5d15e853e6456af47e22cb3d8f31b8301/tools/rw-heatmaps to see what sort of performance we can expect and if there are any toggles say in building the etcd artifacts we could turn on etc.

Actually I plan to propose to replace the existing rw-heatmaps with the new rw-benchmark. There is no change on the script rw-benchmark.sh, which means we will not change how we generate the data and the data format. But we will change how we will visualize the data. PTAL #15060. It's super easy to use.

@ptabor
Copy link
Contributor

ptabor commented Mar 3, 2023

I assume it's put on hold - till investigation will reveal reason of the slowdown. Marking for now as 'draft'.

@ptabor ptabor marked this pull request as draft March 3, 2023 10:14
@kevinzs2048
Copy link
Contributor Author

@serathius @dims @ahrtr Hi any progress on the performance analysis? I'm glad to assist if any help needed.

@serathius
Copy link
Member

serathius commented Mar 17, 2023

@serathius @dims @ahrtr Hi any progress on the performance analysis? I'm glad to assist if any help needed.

Best to ask Geeta about this. @geetasg

@geetasg
Copy link

geetasg commented Apr 12, 2023

sorry for late response. I need to rerun my test before publishing here - haven't been able to get to the rerun yet. @ahrtr is replacement for the heatmaps finalized? Do you recommend testing with https://github.com/ahrtr/etcd/tree/go_chart_20230101/tools/rw-benchmark . Do you have an example of the plots? I am getting 404 on https://github.com/ahrtr/etcd/blob/go_chart_20230101/tools/rw-benchmark/example/rw_benchmark.html

@ahrtr
Copy link
Member

ahrtr commented Apr 12, 2023

@ahrtr is replacement for the heatmaps finalized?

Do not get time to drive that yet. Sorry.

Do you recommend testing with https://github.com/ahrtr/etcd/tree/go_chart_20230101/tools/rw-benchmark .

I think using whichever tool is OK, and it isn't the point. Personally I recommend it because it looks clearer.

I am getting 404 on https://github.com/ahrtr/etcd/blob/go_chart_20230101/tools/rw-benchmark/example/rw_benchmark.html

I removed the examples so as to make the PR less scary:) . I think it's super easy to generate a html file yourself.

Copy link
Member

@jmhbnz jmhbnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this pull request may no longer be required, with recent fixes implemented our e2e-arm64 suite has been running reliably for several weeks now: https://github.com/etcd-io/etcd/actions/workflows/e2e-arm64.yaml

Suggest we monitor for another week or so then if still ok we could maybe close this?

@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 17, 2023
@jmhbnz
Copy link
Member

jmhbnz commented Nov 23, 2023

Discussed during sig etcd triage meeting, this can be closed as etcd ci infra has now changed.

@jmhbnz jmhbnz closed this Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

7 participants