You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deprovisioning of a node is not blocked by pods that are not contributing to the Pod Disruption Budget replica count.
Actual Behavior
We encountered a node that was trying to be deprovisioned by Karpenter but was blocked by a PodDisruptionBudget preventing the eviction.
Normal DeprovisioningBlocked 83s (x631 over 24h) karpenter Cannot deprovision node due to pdb workiva-system/k8s-oauth2-proxy prevents pod evictions
The PDB for k8s-oauth2-proxy has a minAvailable of 1, and the Deployment specifies replicas of 2. In this case, the node had a bad CNI pod and the pod for k8s-oauth2-proxy was caught in a ContainerCreating/CrashLoopBackoff loop and never becoming Ready. Because of this the PDB says no disruptions are allowed, and when karpenter looks at the pods' PDBs it sees that and thinks it can't remove this pod even though it is not contributing to the PDB healthy replicas.
Steps to Reproduce the Problem
Start a deployment with 2 replicas and a PDB that specifies minAvailable of 1. Have the deployment run in a way that allows us to force one of the pods into a CrashLoopBackoff, a decent way would be a test command that verifies the time isn't a certain minute
if [[ $(date +%M)!= 27 ]];thenecho"Wrong Time";exit 1
fi
This way we can install the Deployment right at minute 27, and then if we delete a pod after a minute it should go into CrashLoopBackoff for an hour.
We may also need a readinessProbe here to make sure the pod doesn't become Ready.
Once that deployment is running and the two pods are on different nodes, mark the node with the bad pod for deprovisioning and verify Karpenter will not deprovision due to the PDB we created.
Version
Karpenter Version: v0.27.3
Kubernetes Version: v1.23.16
Expected Behavior
Deprovisioning of a node is not blocked by pods that are not contributing to the Pod Disruption Budget replica count.
Actual Behavior
We encountered a node that was trying to be deprovisioned by Karpenter but was blocked by a PodDisruptionBudget preventing the eviction.
The PDB for k8s-oauth2-proxy has a
minAvailable
of 1, and the Deployment specifies replicas of 2. In this case, the node had a bad CNI pod and the pod for k8s-oauth2-proxy was caught in a ContainerCreating/CrashLoopBackoff loop and never becomingReady
. Because of this the PDB says no disruptions are allowed, and when karpenter looks at the pods' PDBs it sees that and thinks it can't remove this pod even though it is not contributing to the PDB healthy replicas.Steps to Reproduce the Problem
Start a deployment with 2 replicas and a PDB that specifies minAvailable of 1. Have the deployment run in a way that allows us to force one of the pods into a CrashLoopBackoff, a decent way would be a test command that verifies the time isn't a certain minute
This way we can install the Deployment right at minute 27, and then if we delete a pod after a minute it should go into CrashLoopBackoff for an hour.
We may also need a readinessProbe here to make sure the pod doesn't become Ready.
Once that deployment is running and the two pods are on different nodes, mark the node with the bad pod for deprovisioning and verify Karpenter will not deprovision due to the PDB we created.
Resource Specs and Logs
PDB spec showing disruption not allowed.
Potentially looking at the pod status within this loop would be possible to ignore non-ready pods https://github.com/aws/karpenter-core/blob/6a6e5fe226ed5b94e2fd2fa28988e4e1f7601986/pkg/controllers/deprovisioning/pdblimits.go#L62
Community Note
The text was updated successfully, but these errors were encountered: