-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-18676: ovnkube: set northd backoff-interval and use a single thread to save CPU #1990
Conversation
From an NBDB container on the cluster:
|
/retest AWS |
@dcbw: This pull request references Jira Issue OCPBUGS-18676, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test |
@trozet: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test qe-perfscale-aws-ovn-cluster-density |
1 similar comment
/test qe-perfscale-aws-ovn-cluster-density |
/test |
@jtaleric: The
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test qe-perfscale-aws-ovn-cluster-density |
northd has an option to sleep for a short amount of time after processing changes from NB/SB that allows it to trade off a bit of latency for a lot of CPU savings. Since events from NB come frequently during scale tests northd doesn't have a lot of time to sleep. Until we have more incremental processing, most of that CPU time is burned just recalculating things that haven't changed, so it's mostly wasted. Letting northd sleep has been shown in density-light and density-cni 120 node scale tests to have almost no adverse effect on P99 PodReady times, but a huge improvement in CPU utilization.
Northd threading parallelizes the logical flow (lflow) building part of the northd processing loop. While this speeds up northd processing it does have a slight CPU cost (~20%) to map/reduce the work. Threading improved latency when northd processed large numbers of logical flows in centralized OVN clusters. With IC each northd only handles a single node in the cluster and thus processes fewer lflows. Scale testing indicates that the threading tradeoff is no longer worth it; we achieve the same P99 PodReadyLatency across multiple scenarios with 1 or 4 threads. We might as well save the CPU if there no longer any latency benefit.
Critical fix for perf/scale. From our testing it this cuts northd CPU in half, while having no impact to pod latency. /label acknowledge-critical-fixes-only |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dcbw, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required |
@dcbw: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/tide refresh |
@dcbw: Jira Issue OCPBUGS-18676: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-18676 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.14 |
@dcbw: new pull request created: #1998 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
northd has an option to sleep for a short amount of time after processing changes from NB/SB that allows it to trade off a bit of latency for a lot of CPU savings. Since events from NB come frequently during scale tests northd doesn't have a lot of time to sleep. Until we have more incremental processing, most of that CPU time is burned just recalculating things that haven't changed, so it's mostly wasted.
Letting northd sleep has been shown in density-light and density-cni 120 node scale tests to have almost no adverse effect on P99 PodReady times, but a huge improvement in CPU utilization.
Upstream equivalent is ovn-kubernetes/ovn-kubernetes#3877