Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for disable-force-detach-on-timeout #45119

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions content/en/docs/concepts/architecture/nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -513,6 +513,35 @@ During a non-graceful shutdown, Pods are terminated in the two phases:
recovered since the user was the one who originally added the taint.
{{< /note >}}

### Forced storage detach on timeout {#storage-force-detach-on-timeout}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of detail for someone who may be learning, for the very first time, what a Kubernetes cluster is.

We'd do well to find a better home for this detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know...

I feel like this belongs in the same section as Non-Graceful node shutdown.

After all, we're saying "This option is risky and can lead to data corruption; how about using the option above?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this belongs in the same section as Non-Graceful node shutdown.

I agree. We should move both of those details to another page, ideally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that that page was probably put there because graceful node shutdown was a thing:

https://github.com/kubernetes/website/blob/snapshot-initial-v1.20/content/en/docs/concepts/architecture/nodes.md#graceful-node-shutdown-graceful-node-shutdown

If we think this is the right thing to do, I'd be happy to move both Graceful & Non-Graceful, just Non-Graceful, or just the new section.

What makes the most sense at a high level?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this merges, I've created an issue to move "Graceful Node Shutdown" and "Ungraceful Node Shutdown" sections along with "Forced storage detach on timeout" (PR pending) section
Issue #45690


In any situation where a pod deletion has not succeeded for 6 minutes, kubernetes will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "6 minutes" is a constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a hard-coded constant.

force detach volumes being unmounted if the node is unhealthy at that instant. Any
workload still running on the node that uses a force-detached volume will cause a
violation of the
[CSI specification](https://github.com/container-storage-interface/spec/blob/master/spec.md#controllerunpublishvolume),
which states that `ControllerUnpublishVolume` "**must** be called after all
`NodeUnstageVolume` and `NodeUnpublishVolume` on the volume are called and succeed".
In such circumstances, volumes on the node in question might encounter data corruption.

The forced storage detach behaviour is optional; users might opt to use the "Non-graceful
node shutdown" feature instead.

Force storage detach on timeout can be disabled by setting the `disable-force-detach-on-timeout`
config field in `kube-controller-manager`. Disabling the force detach on timeout feature means
that a volume that is hosted on a node that is unhealthy for more than 6 minutes will not have
its associated
[VolumeAttachment](/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/)
deleted.

After this setting has been applied, unhealthy pods still attached to a volumes must be recovered
via the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above.

{{< note >}}
- Caution must be taken while using the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure.
- Deviation from the steps documented above can result in data corruption.
{{< /note >}}

## Swap memory management {#swap-memory}

{{< feature-state state="beta" for_k8s_version="v1.28" >}}
Expand Down
1 change: 1 addition & 0 deletions static/_redirects
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@
/docs/concepts/jobs/run-to-completion-finite-workloads/ /docs/concepts/workloads/controllers/job/ 301
/id/docs/concepts/jobs/run-to-completion-finite-workloads/ /id/docs/concepts/workloads/controllers/job/ 301
/docs/concepts/nodes/node/ /docs/concepts/architecture/nodes/ 301
/docs/storage-force-detach-on-timeout/ /docs/concepts/architecture/nodes/#storage-force-detach-on-timeout 302
/docs/concepts/services-networking/connect-applications-service/ /docs/tutorials/services/connect-applications-service/ 301
/docs/concepts/object-metadata/annotations/ /docs/concepts/overview/working-with-objects/annotations/ 301
/docs/concepts/overview/ /docs/concepts/overview/what-is-kubernetes/ 301
Expand Down