-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k8s_drain "Failed to delete pod" "Too many requests" #474
Comments
@impsik Thanks for filing the issue. The eviction API can sometimes return a 429 Too Many Requests status, especially if the eviction would violate the pod disruption budget. It's unclear if that's what is going on in your case or not. We should be retrying 429 responses here, and we aren't. As a workaround until this is fixed, you could try using Ansible's built-in retry logic. |
@gravesm My setup was (i blew it up):
I installed Longhorn, 3 replicas, trough Rancher apps.
Another workaround for me was to use shell module, works fine: |
I also can confirm this behavior. It would be nice if the kubernetes.core module can support the pod-selector configuration. The usage of the shell module is just a "dirty workaround". |
Same here, few Pods on a Kubernetes control-plane node, no PDB on the Pods. Sometimes it does not evict any Pods. The only reliable workaround for the moment is to revert to |
Hi! I use kubernetes.core 3.2.0 and I still have this issue:
I'd prefer to use this module, but maybe I have to fallback to shell? thanks for your help! |
SUMMARY
I try to drain kubernetes nodes, do some patching and uncordon those nodes, but sometimes it fails with
"msg": "Failed to delete pod POD NAME HERE due to: Too Many Requests"
It's usually Longhorn POD.
And it's empty cluster (1etcd/CP, 3 worker nodes), for testing only.
When i drain manually it will take up to 2min to drain that node.
ISSUE TYPE
COMPONENT NAME
k8s_drain
ANSIBLE VERSION
COLLECTION VERSION
CONFIGURATION
OS / ENVIRONMENT
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
STEPS TO REPRODUCE
EXPECTED RESULTS
TASK [Drain node 192.168.122.11, even if there are pods not managed by a ReplicationController, Job, or DaemonSet on it.] *****************************************************************************************
changed: [192.168.122.11 -> localhost]
ACTUAL RESULTS
The full traceback is:
File "/tmp/ansible_kubernetes.core.k8s_drain_payload_favq191w/ansible_kubernetes.core.k8s_drain_payload.zip/ansible_collections/kubernetes/core/plugins/modules/k8s_drain.py", line 324, in evict_pods
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 7652, in create_namespaced_pod_eviction
return self.create_namespaced_pod_eviction_with_http_info(name, namespace, body, **kwargs) # noqa: E501
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 7759, in create_namespaced_pod_eviction_with_http_info
return self.api_client.call_api(
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 391, in request
return self.rest_client.POST(url,
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 275, in POST
return self.request("POST", url,
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 234, in request
raise ApiException(http_resp=r)
fatal: [192.168.122.11 -> localhost]: FAILED! => {
"changed": false,
"invocation": {
"module_args": {
"api_key": null,
"ca_cert": null,
"client_cert": null,
"client_key": null,
"context": null,
"delete_options": {
"delete_emptydir_data": true,
"disable_eviction": false,
"force": true,
"ignore_daemonsets": true,
"terminate_grace_period": 5,
"wait_sleep": 20,
"wait_timeout": null
},
"host": null,
"impersonate_groups": null,
"impersonate_user": null,
"kubeconfig": "/home/imre/.kube/config",
"name": "192.168.122.11",
"no_proxy": null,
"password": null,
"persist_config": null,
"proxy": null,
"proxy_headers": null,
"state": "drain",
"username": null,
"validate_certs": null
}
},
"msg": "Failed to delete pod longhorn-system/instance-manager-e-f5feaabb due to: Too Many Requests"
}
The text was updated successfully, but these errors were encountered: