Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve network perf test #2695

Closed
wants to merge 6 commits into from
Closed

Conversation

marseel
Copy link
Contributor

@marseel marseel commented Jul 18, 2024

This PR contains various improvements to net perf testing - more details in commit messages.

Most notable improvement is that we no longer use fixed message size for TCP_STREAM test, which results in way more reasonable results as compared to the previous tests.
First of all, we can finally saturate network throughput and also throughput between pods on the same node is way higher than throughput between pods running on different nodes. Example test results for native routing:

-------------------------------------------------------------------------------------
📋 Scenario        | Node       | Test            | Duration        | Throughput Mb/s 
-------------------------------------------------------------------------------------
📋 pod-to-pod      | same-node  | TCP_STREAM      | 10s             | 14257.07     
📋 host-to-host    | same-node  | TCP_STREAM      | 10s             | 25766.59     
📋 pod-to-pod      | other-node | TCP_STREAM      | 10s             | 9541.04      
📋 host-to-host    | other-node | TCP_STREAM      | 10s             | 9579.99      
-------------------------------------------------------------------------------------

Previously native throughput was showing sad 1500 Mb/s for the same Cilium configuration.

How the new output looks like in perfdash:
image
image
Makes it much easier to compare host vs pod throughput / latency etc.

@qmonnet qmonnet added the needs-rebase This PR needs to be rebased because it has merge conflicts. label Jul 24, 2024
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

I spotted a typo (explicilty) in the description in the third commit, if you need to update your PR for any other reason [EDIT: For example, to address a merge conflict 🙂].

CODEOWNERS Show resolved Hide resolved
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 28, 2024
@michi-covalent michi-covalent removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 31, 2024
Previously, each measurement was exported as separate json object.
This resulted in poor user experience when displaying these results in
perfdash. Now we aggregate latency/throughput/transaction output based
on category so it's easier to spot regressions in CI.

Signed-off-by: Marcel Zieba <[email protected]>
Now, you can choose tests to run, throughput/crr/rr.
Also, adds options to specify message size.

Signed-off-by: Marcel Zieba <[email protected]>
Previously, they didn't explicitly say which test was being run.

Signed-off-by: Marcel Zieba <[email protected]>
This is next step for separating perf test from regular connectivity
tests.

Signed-off-by: Marcel Zieba <[email protected]>
Previously, we were also setting it for TCP_STREAM, which is
significantly lowering throughput. This was different from all other
tests that were being performed in the past, for example in
cilium/cilium-perf-networking

Signed-off-by: Marcel Zieba <[email protected]>
@marseel
Copy link
Contributor Author

marseel commented Aug 12, 2024

rebased and resolved conflict on CODEOWNERS.

@marseel
Copy link
Contributor Author

marseel commented Aug 12, 2024

"Lock cilium-cli repo / Lock cilium-cli repo (pull_request)"
😭

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase This PR needs to be rebased because it has merge conflicts.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants