Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added deprovisioning docs to preview folder #1170

Merged
merged 2 commits into from
Jan 18, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 67 additions & 15 deletions website/content/en/preview/tasks/deprovisioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,85 @@ linkTitle: "Deprovisioning"
weight: 10
---

Karpenter sets a Kubernetes [finalizer](https://kubernetes.io/docs/concepts/overview/working-with-objects/finalizers/) on each node it provisions.
The finalizer specifies additional actions the Karpenter controller will take in response to a node deletion request.
These include:

## Deletion Workflow
* Marking the node as unschedulable, so no further pods can be scheduled there.
* Evicting all pods other than daemonsets from the node.
* Terminating the instance from the cloud provider.
* Deleting the node from the Kubernetes cluster.

### Finalizer
## How Karpenter nodes are deprovisioned

Karpenter adds a finalizer to provisioned nodes. [Review how finalizers work.](https://kubernetes.io/docs/concepts/overview/working-with-objects/finalizers/#how-finalizers-work)
There are both automated and manual ways of deprovisioning nodes provisioned by Karpenter:

### Drain Nodes
* **Node empty**: Karpenter notes when the last workload (non-daemonset) pod stops running on a node. From that point, Karpenter waits the number of seconds set by `ttlSecondsAfterEmpty` in the provisioner, then Karpenter requests to delete the node. This feature can keep costs down by removing nodes that are no longer being used for workloads.
* **Node expired**: Karpenter requests to delete the node after a set number of seconds, based on the provisioner `ttlSecondsUntilExpired` value, from the time the node was provisioned. One use case for node expiry is to handle node upgrades. Old nodes (with a potentially outdated Kubernetes version or operating system) are deleted, and replaced with nodes on the current version (assuming that you requested the latest version, rather than a specific version).

Review how to [safely drain a node](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/).
{{% alert title="Note" color="primary" %}}
Keep in mind that a small NodeExpiry results in a higher churn in cluster activity. So, for example, if a cluster
brings up all nodes at once, all the pods on those nodes would fall into the same batching window on expiration.
{{% /alert %}}

* **Node deleted**: You could use `kubectl` to manually remove a single Karpenter node:

## Delete Node
```bash
# Delete a specific node
kubectl delete node $NODE_NAME

# Delete all nodes owned any provisioner
kubectl delete nodes -l karpenter.sh/provisioner-name

# Delete all nodes owned by a specific provisioner
kubectl delete nodes -l karpenter.sh/provisioner-name=$PROVISIONER_NAME
```

Karpenter changes the behavior of `kubectl delete node`. Nodes will be drained, and then the underlying instance will be deleted.
Whether through node expiry or manual deletion, Karpenter seeks to follow graceful termination procedures as described in Kubernetes [Graceful node shutdown](https://kubernetes.io/docs/concepts/architecture/nodes/#graceful-node-shutdow) documentation.
If the Karpenter controller is removed or fails, the finalizers on the nodes are orphaned and will require manual removal.

## Disruption Budget

Karpenter respects Pod Disruption Budgets. Review what [disruptions are](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/), and [how to configure them](https://kubernetes.io/docs/tasks/run-application/configure-pdb/).
{{% alert title="Note" color="primary" %}}
By adding the finalizer, Karpenter improves the default Kubernetes process of node deletion.
When you run `kubectl delete node` on a node without a finalizer, the node is deleted without triggering the finalization logic. The instance will continue running in EC2, even though there is no longer a node object for it.
The kubelet isn’t watching for its own existence, so if a node is deleted the kubelet doesn’t terminate itself.
All the pod objects get deleted by a garbage collection process later, because the pods’ node is gone.
{{% /alert %}}

Generally, pod workloads may be configured with `.spec.minAvailable` and/or `.spec.maxUnavailable`. Karpenter provisions nodes to accommodate these constraints.
## What can cause deprovisioning to fail?

## Emptiness
There are a few cases where requesting to deprovision a Karpenter node will fail. These include Pod Disruption Budgets and pods that have the `do-not-evict` annotation set.

Karpenter will delete nodes (and the instance) that are considered empty of pods. Daemonset pods are not included in this calculation.
### Disruption budgets

## Expiry
Karpenter respects Pod Disruption Budgets (PDBs) by using a backoff retry eviction strategy. Pods will never be forcibly deleted, so pods that fail to shut down will prevent a node from deprovisioning.
Kubernetes PDBs let you specify how much of a Deployment, ReplicationController, ReplicaSet, or StatefulSet must be protected from disruptions when pod eviction requests are made.

Nodes may be configured to expire. That is, a maximum lifetime in seconds starting with the node joining the cluster. Review the `ttlSecondsUntilExpired` field of the [provisioner API](../../provisioner/).
PDBs can be used to strike a balance by protecting the application's availability while still allowing a cluster administrator to manage the cluster.
Here is an example where the pods matching the label `myapp` will block node termination if evicting the pod would reduce the number of available pods below 4.

Note that newly created nodes have a Kubernetes version matching the control plane. One use case for node expiry is to handle node upgrades. Old nodes (with a potentially outdated Kubernetes version) are deleted, and replaced with nodes on the current version.
```yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: myapp-pdb
spec:
minAvailable: 4
selector:
matchLabels:
app: myapp
```

You can set `minAvailable` or `maxUnavailable` as integers or as a percentage.
Review what [disruptions are](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/), and [how to configure them](https://kubernetes.io/docs/tasks/run-application/configure-pdb/).

### Pod set to do-not-evict

If a pod exists with the annotation `karpenter.sh/do-not-evict` on a node, and a request is made to delete the node, Karpenter will not drain any pods from that node or otherwise try to delete the node.
However, if a`do-not-evict` pod is added to a node while the node is draining, the remaining pods will still evict, but that pod will block termination until it is removed.
In either case, the node will be cordoned to prevent additional work from scheduling.

That annotation is used for pods that you want to run on one node from start to finish without interruption.
Examples might include a real-time, interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted.

If you want to terminate a `do-not-evict` pod, you can simply remove the annotation and the finalizer will delete the pod and continue the node deprovisioning process.