-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for disable-force-detach-on-timeout #45119
Merged
k8s-ci-robot
merged 1 commit into
kubernetes:dev-1.30
from
rohitssingh:disable_force_detach-dev-1.30
Mar 26, 2024
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -513,6 +513,35 @@ During a non-graceful shutdown, Pods are terminated in the two phases: | |
recovered since the user was the one who originally added the taint. | ||
{{< /note >}} | ||
|
||
### Forced storage detach on timeout {#storage-force-detach-on-timeout} | ||
|
||
In any situation where a pod deletion has not succeeded for 6 minutes, kubernetes will | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The "6 minutes" is a constant? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it's a hard-coded constant. |
||
force detach volumes being unmounted if the node is unhealthy at that instant. Any | ||
workload still running on the node that uses a force-detached volume will cause a | ||
violation of the | ||
[CSI specification](https://github.com/container-storage-interface/spec/blob/master/spec.md#controllerunpublishvolume), | ||
which states that `ControllerUnpublishVolume` "**must** be called after all | ||
`NodeUnstageVolume` and `NodeUnpublishVolume` on the volume are called and succeed". | ||
In such circumstances, volumes on the node in question might encounter data corruption. | ||
|
||
The forced storage detach behaviour is optional; users might opt to use the "Non-graceful | ||
node shutdown" feature instead. | ||
|
||
Force storage detach on timeout can be disabled by setting the `disable-force-detach-on-timeout` | ||
config field in `kube-controller-manager`. Disabling the force detach on timeout feature means | ||
that a volume that is hosted on a node that is unhealthy for more than 6 minutes will not have | ||
its associated | ||
[VolumeAttachment](/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/) | ||
deleted. | ||
|
||
After this setting has been applied, unhealthy pods still attached to a volumes must be recovered | ||
via the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above. | ||
rohitssingh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
{{< note >}} | ||
- Caution must be taken while using the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure. | ||
- Deviation from the steps documented above can result in data corruption. | ||
{{< /note >}} | ||
|
||
## Swap memory management {#swap-memory} | ||
|
||
{{< feature-state state="beta" for_k8s_version="v1.28" >}} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of detail for someone who may be learning, for the very first time, what a Kubernetes cluster is.
We'd do well to find a better home for this detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about https://kubernetes.io/docs/concepts/cluster-administration/?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know...
I feel like this belongs in the same section as Non-Graceful node shutdown.
After all, we're saying "This option is risky and can lead to data corruption; how about using the option above?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. We should move both of those details to another page, ideally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that that page was probably put there because graceful node shutdown was a thing:
https://github.com/kubernetes/website/blob/snapshot-initial-v1.20/content/en/docs/concepts/architecture/nodes.md#graceful-node-shutdown-graceful-node-shutdown
If we think this is the right thing to do, I'd be happy to move both Graceful & Non-Graceful, just Non-Graceful, or just the new section.
What makes the most sense at a high level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this merges, I've created an issue to move "Graceful Node Shutdown" and "Ungraceful Node Shutdown" sections along with "Forced storage detach on timeout" (PR pending) section
Issue #45690