Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet #2657

Closed
9 of 10 tasks
atsikham opened this issue Oct 4, 2021 · 3 comments
Assignees

Comments

@atsikham
Copy link
Contributor

atsikham commented Oct 4, 2021

Describe the bug
Upgrade fails when there is at least 1 unmanaged pod

How to reproduce
Steps to reproduce the behavior:

  1. install k8s cluster
  2. create a single pod
  3. upgrade k8s cluster
13:32:44 ERROR cli.engine.ansible.AnsibleCommand - fatal: [atsikham-k8s-kubernetes-node-vm-0 -> atsikham-k8s-kubernetes-master-13:32:37 ERROR cli.engine.ansible.AnsibleCommand - FAILED - RETRYING: k8s/utils | Drain master or node in preparation for maintenance (1 retries left).
13:32:44 ERROR cli.engine.ansible.AnsibleCommand - fatal: [atsikham-k8s-kubernetes-node-vm-0 -> atsikham-k8s-kubernetes-master-vm-0]: FAILED! => {"attempts": 20, "changed": true, "cmd": ["kubectl", "drain", "atsikham-k8s-kubernetes-node-vm-0", "--ignore-daemonsets", "--delete-local-data"], "delta": "0:00:00.086227", "end": "2021-10-01 13:32:44.175848", "msg": "non-zero return code", "rc": 1, "start": "2021-10-01 13:32:44.089621", "stderr": "error: unable to drain node \"atsikham-k8s-kubernetes-node-vm-0\", aborting command...\n\nThere are pending nodes to be drained:\n atsikham-k8s-kubernetes-node-vm-0\nerror: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/azure, default/azure1, default/azure2", "stderr_lines": ["error: unable to drain node \"atsikham-k8s-kubernetes-node-vm-0\", aborting command...", "", "There are pending nodes to be drained:", " atsikham-k8s-kubernetes-node-vm-0", "error: cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): default/azure, default/azure1, default/azure2"], "stdout": "node/atsikham-k8s-kubernetes-node-vm-0 already cordoned", "stdout_lines": ["node/atsikham-k8s-kubernetes-node-vm-0 already cordoned"]}

Expected behavior
There is a preflight check to inform the user about such pods. After a user makes all preparations for theirs pods and stops them, upgrade is successful.

Config files
Pod example can be taken here

Environment

  • Cloud provider: All
  • OS: All

epicli version: 1.3 develop

Additional context


DoD checklist

  • Changelog updated (if affected version was released)
  • COMPONENTS.md updated / doesn't need to be updated
  • Automated tests passed (QA pipelines)
    • apply
    • upgrade
  • Case covered by automated test (if possible)
  • Idempotency tested
  • Documentation updated / doesn't need to be updated
  • All conversations in PR resolved
  • Backport tasks created / doesn't need to be backported
@mkyc
Copy link
Contributor

mkyc commented Oct 8, 2021

This will have to back ported.

@mkyc mkyc added the type/low-hanging-fruit Good, nice, simple task label Nov 5, 2021
@to-bar
Copy link
Contributor

to-bar commented Nov 5, 2021

Proposed solution is to add preflight check that will query k8s for problematic pods before starting the upgrade. If any found, user should get an error with suggested command to execute manually before rerunning the upgrade.

@pprach pprach self-assigned this Nov 22, 2021
@przemyslavic przemyslavic self-assigned this Dec 10, 2021
@przemyslavic
Copy link
Collaborator

✔️ The preflight check has been added to verify if there are any pods not created by the controller.

@seriva seriva closed this as completed Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants