-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize a request to find device plugin pods. #599
Conversation
Thanks for your PR,
To skip the vendors CIs use one of:
|
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alan-kut
at what number of nodes did you hit this issue ?
LGTM.
FieldSelector: "spec.nodeName=" + vars.NodeName, | ||
LabelSelector: "app=sriov-device-plugin", | ||
FieldSelector: "spec.nodeName=" + vars.NodeName, | ||
ResourceVersion: "0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we just care for the existance of such pod i believe we are OK with specifying this.
some additional reading on the semantics of ResourceVersion field for other reviewers[1]
[1] https://kubernetes.io/docs/reference/using-api/api-concepts/#semantics-for-get-and-list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrianchiris thanks for linking the documentation.
I hit the issue with 4k nodes running the plugin |
The failures are in the controllers, not daemon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add a resourceVersion=0 to the reuqest to find device plugin pods on a given node from the latest cache from the kube-apiserver. Without a resourceVersion the request needs to reach etcd first and then kube-apiserver will filter out the pods. Due to this etcd can get overloaded. Since it runs in daemonset, they get created on every node. If a cluster has a lot of nodes the list requests will put a cluster down.
Thanks for your PR,
To skip the vendors CIs use one of:
|
Rebased |
Pull Request Test Coverage Report for Build 7741878100
💛 - Coveralls |
Add a resourceVersion=0 to the reuqest to find device plugin pods on a given node from the latest cache from the kube-apiserver. Without a resourceVersion the request needs to reach etcd first and then kube-apiserver will filter out the pods.
Due to this etcd can get overloaded. Since it runs in daemonset, they get created on every node. If a cluster has a lot of nodes the list requests will put a cluster down.