k8s_drain "Failed to delete pod" "Too many requests" #474

impsik · 2022-06-08T17:33:41Z

SUMMARY

I try to drain kubernetes nodes, do some patching and uncordon those nodes, but sometimes it fails with
"msg": "Failed to delete pod POD NAME HERE due to: Too Many Requests"
It's usually Longhorn POD.
And it's empty cluster (1etcd/CP, 3 worker nodes), for testing only.
When i drain manually it will take up to 2min to drain that node.

node/192.168.122.11 evicted

real	1m50.449s
user	0m1.128s
sys	0m0.645s

ISSUE TYPE

Bug Report

COMPONENT NAME

k8s_drain

ANSIBLE VERSION

ansible [core 2.12.6]

COLLECTION VERSION


# /usr/lib/python3/dist-packages/ansible_collections
Collection      Version
--------------- -------
kubernetes.core 2.3.1

CONFIGURATION

DEFAULT_HOST_LIST(/etc/ansible/ansible.cfg) = ['/home/username/hosts']
DEPRECATION_WARNINGS(/etc/ansible/ansible.cfg) = False

OS / ENVIRONMENT

NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"

STEPS TO REPRODUCE

- name: "Drain node {{ inventory_hostname|lower }}, even if there are pods not managed by a ReplicationController, Job, or DaemonSet on it."
      kubernetes.core.k8s_drain:
        state: drain
        name: "{{ inventory_hostname|lower }}"
        kubeconfig: ~/.kube/config
        delete_options:
          ignore_daemonsets: yes
          delete_emptydir_data: yes
          force: yes
          terminate_grace_period: 5
          wait_sleep: 20
      delegate_to: localhost

EXPECTED RESULTS

TASK [Drain node 192.168.122.11, even if there are pods not managed by a ReplicationController, Job, or DaemonSet on it.] *****************************************************************************************
changed: [192.168.122.11 -> localhost]

ACTUAL RESULTS

The full traceback is:
File "/tmp/ansible_kubernetes.core.k8s_drain_payload_favq191w/ansible_kubernetes.core.k8s_drain_payload.zip/ansible_collections/kubernetes/core/plugins/modules/k8s_drain.py", line 324, in evict_pods
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 7652, in create_namespaced_pod_eviction
return self.create_namespaced_pod_eviction_with_http_info(name, namespace, body, **kwargs) # noqa: E501
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api/core_v1_api.py", line 7759, in create_namespaced_pod_eviction_with_http_info
return self.api_client.call_api(
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 348, in call_api
return self.__call_api(resource_path, method,
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 180, in __call_api
response_data = self.request(
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/api_client.py", line 391, in request
return self.rest_client.POST(url,
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 275, in POST
return self.request("POST", url,
File "/usr/local/lib/python3.8/dist-packages/kubernetes/client/rest.py", line 234, in request
raise ApiException(http_resp=r)
fatal: [192.168.122.11 -> localhost]: FAILED! => {
"changed": false,
"invocation": {
"module_args": {
"api_key": null,
"ca_cert": null,
"client_cert": null,
"client_key": null,
"context": null,
"delete_options": {
"delete_emptydir_data": true,
"disable_eviction": false,
"force": true,
"ignore_daemonsets": true,
"terminate_grace_period": 5,
"wait_sleep": 20,
"wait_timeout": null
},
"host": null,
"impersonate_groups": null,
"impersonate_user": null,
"kubeconfig": "/home/imre/.kube/config",
"name": "192.168.122.11",
"no_proxy": null,
"password": null,
"persist_config": null,
"proxy": null,
"proxy_headers": null,
"state": "drain",
"username": null,
"validate_certs": null
}
},
"msg": "Failed to delete pod longhorn-system/instance-manager-e-f5feaabb due to: Too Many Requests"
}

The text was updated successfully, but these errors were encountered:

gravesm · 2022-06-27T15:27:10Z

@impsik Thanks for filing the issue. The eviction API can sometimes return a 429 Too Many Requests status, especially if the eviction would violate the pod disruption budget. It's unclear if that's what is going on in your case or not. We should be retrying 429 responses here, and we aren't. As a workaround until this is fixed, you could try using Ansible's built-in retry logic.

impsik · 2022-06-28T08:19:05Z

@gravesm My setup was (i blew it up):
3 etcd/cp nodes, 3 worker nodes. SSD disks.

Node config:
OS type and version: Ubuntu 20.04.4 LTS
CPU per node: 4
Memory per node: 8GB

I installed Longhorn, 3 replicas, trough Rancher apps.
I installed Wordpress for testing LINK and this created 2 PVC, one for SQL, one for wordpress.
From the Longhorn docs i found that i also need to use --pod-selector='app!=csi-attacher,app!=csi-provisioner'
However pod-selector option is not supported by kubernetes.core.k8s_drain

$ kubectl get poddisruptionbudgets -n longhorn-system
NAME                          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
instance-manager-e-5369c190   1               N/A               0                     19m
instance-manager-e-662fd89d   1               N/A               0                     13m
instance-manager-e-b944b669   1               N/A               0                     16m
instance-manager-r-26f65c39   1               N/A               0                     16m
instance-manager-r-6030e0e8   1               N/A               0                     13m
instance-manager-r-6314b679   1               N/A               0                     19m

Another workaround for me was to use shell module, works fine:
shell: kubectl drain {{ inventory_hostname|lower }} --ignore-daemonsets --delete-emptydir-data --force --pod-selector='app!=csi-attacher,app!=csi-provisioner' --kubeconfig ~/.kube/config

0Styless · 2022-11-18T07:41:55Z

I also can confirm this behavior. It would be nice if the kubernetes.core module can support the pod-selector configuration. The usage of the shell module is just a "dirty workaround".

stephan2012 · 2023-02-03T18:29:50Z

Same here, few Pods on a Kubernetes control-plane node, no PDB on the Pods. Sometimes it does not evict any Pods.

The only reliable workaround for the moment is to revert to kubectl.

pierreozoux · 2025-01-21T12:38:33Z

Hi!

I use kubernetes.core 3.2.0 and I still have this issue:

- name: Mark node as unschedulable.
  delegate_to: localhost
  become: no
  kubernetes.core.k8s_drain:
    state: cordon
    name: "{{ inventory_hostname }}"

- name: Remove pg label see https://github.com/zalando/postgres-operator/issues/547#issuecomment-486308679
  delegate_to: localhost
  become: no
  kubernetes.core.k8s:
    kind: Node
    name: "{{ inventory_hostname }}"
    state: patched
    definition:
      metadata:
        labels:
          node.libre.sh/postgres: "false"

- name: Drain node even if there are pods not managed by a ReplicationController, Job, or DaemonSet on it.
  delegate_to: localhost
  become: no
  kubernetes.core.k8s_drain:
    state: drain
    delete_options:
      force: yes
      delete_emptydir_data: yes
      ignore_daemonsets: yes
    name: "{{ inventory_hostname }}"

TASK [drain : Drain node even if there are pods not managed by a ReplicationController, Job, or DaemonSet on it.] *************************************************************************************************************************************************************************************
fatal: [scw-prod-elastic-metal-paris2-xxx -> localhost]: FAILED! => {"changed": false, "msg": "Failed to delete pod xxx/prod--de-re-0 due to: Too Many Requests"}

I'd prefer to use this module, but maybe I have to fallback to shell?

thanks for your help!

gravesm added type/bug Something isn't working jira labels Jun 27, 2022

abikouo self-assigned this Apr 3, 2023

This was referenced Apr 3, 2023

add ability to filter the list of pods to be drained by a label selector #603

Closed

add ability to filter the list of pods to be drained by a pod label selector #606

Merged

abikouo closed this as completed in #606 May 31, 2023

JB26 mentioned this issue May 10, 2024

kubernetes.core.k8s_drain: drain can get stuck because pods are evicted in order #711

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8s_drain "Failed to delete pod" "Too many requests" #474

k8s_drain "Failed to delete pod" "Too many requests" #474

impsik commented Jun 8, 2022

gravesm commented Jun 27, 2022

impsik commented Jun 28, 2022 •

edited

Loading

0Styless commented Nov 18, 2022

stephan2012 commented Feb 3, 2023

pierreozoux commented Jan 21, 2025

k8s_drain "Failed to delete pod" "Too many requests" #474

k8s_drain "Failed to delete pod" "Too many requests" #474

Comments

impsik commented Jun 8, 2022

SUMMARY

ISSUE TYPE

COMPONENT NAME

ANSIBLE VERSION

COLLECTION VERSION

CONFIGURATION

OS / ENVIRONMENT

STEPS TO REPRODUCE

EXPECTED RESULTS

ACTUAL RESULTS

gravesm commented Jun 27, 2022

impsik commented Jun 28, 2022 • edited Loading

0Styless commented Nov 18, 2022

stephan2012 commented Feb 3, 2023

pierreozoux commented Jan 21, 2025

impsik commented Jun 28, 2022 •

edited

Loading