Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: always use -ignore-system on node drain with CSI #8606

Merged
merged 1 commit into from
Aug 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions website/pages/docs/commands/node/drain.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,12 @@ all allocations have terminated. Canceling the `node drain` command _will not_
cancel the drain. Drains may be canceled by using the `-disable` parameter
below.

When draining more than one node at a time, it is recommended you first disable
[scheduling eligibility][eligibility] on all nodes that will be drained. For
example if you are decommissioning an entire class of nodes, first run `node eligibility -disable` on all of their node IDs, and then run `node drain -enable`. This will ensure allocations drained from the first node are not
placed on another node about to be drained.
When draining more than one node at a time, it is recommended you first
disable [scheduling eligibility][eligibility] on all nodes that will be
drained. For example if you are decommissioning an entire class of nodes,
first run `node eligibility -disable` on all of their node IDs, and then run
`node drain -enable`. This will ensure allocations drained from the first node
are not placed on another node about to be drained.

The [node status] command compliments this nicely by providing the current drain
status of a given node.
Expand Down Expand Up @@ -65,8 +67,10 @@ operation is desired.
- `-no-deadline`: No deadline allows the allocations to drain off the node
without being force stopped after a certain deadline.

- `-ignore-system`: Ignore system allows the drain to complete without stopping
system job allocations. By default system jobs are stopped last.
- `-ignore-system`: Ignore system allows the drain to complete without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgross I kinda wish this flag was default "on", I've yet had a case where I haven't had to set this to true in my various nomad deployments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree. Let me float that by the rest of the team to see if it's something we feel safe flipping before 1.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #8622 for further discussion of that default. Thanks for raising it, @jippi!

stopping system job allocations. By default system jobs are stopped
last. You should always use this flag when draining a node running
[CSI node plugins][internals-csi].

- `-keep-ineligible`: Keep ineligible will maintain the node's scheduling
ineligibility even if the drain is being disabled. This is useful when an
Expand Down Expand Up @@ -135,3 +139,4 @@ $ nomad node drain -self -monitor
[migrate]: /docs/job-specification/migrate
[node status]: /docs/commands/node/status
[workload migration guide]: https://learn.hashicorp.com/nomad/operating-nomad/node-draining
[internals-csi]: /docs/internals/plugins/csi
30 changes: 13 additions & 17 deletions website/pages/docs/internals/plugins/csi.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,12 @@ that perform both the controller and node roles in the same
instance. Not every plugin provider has or needs a controller; that's
specific to the provider implementation.

You should almost always run node plugins as Nomad `system` jobs to
ensure volume claims are released when a Nomad client is drained. Use
constraints for the node plugin jobs based on the availability of
volumes. For example, AWS EBS volumes are specific to particular
availability zones with a region. Controller plugins can be run as
`service` jobs.
You should always run node plugins as Nomad `system` jobs and use the
`-ignore-system` flag on the `nomad node drain` command to ensure that the
node plugins are still running while the node is being drained. Use
constraints for the node plugin jobs based on the availability of volumes. For
example, AWS EBS volumes are specific to particular availability zones with a
region. Controller plugins can be run as `service` jobs.

Nomad exposes a Unix domain socket named `csi.sock` inside each CSI
plugin task, and communicates over the gRPC protocol expected by the
Expand Down Expand Up @@ -111,17 +111,13 @@ client, and the node plugin mounts the volume to a staging area in
the Nomad data directory. Nomad will bind-mount this staged directory
into each task that mounts the volume.

This cycle is reversed when a task that claims a volume becomes
terminal. The client updates the server frequently about changes to
allocations, including terminal state. When the server receives a
terminal state for a job with volume claims, it creates a volume claim
garbage collection (GC) evaluation to to handled by the core job
scheduler. The GC job will send "detach" RPCs to the node plugin. The
node plugin unmounts the bind-mount from the allocation and unmounts
the volume from the plugin (if it's not in use by another task). The
GC job will then send "unpublish" RPCs to the controller plugin (if
any), and decrement the claim count for the volume. At this point the
volume’s claim capacity has been freed up for scheduling.
This cycle is reversed when a task that claims a volume becomes terminal. The
client will send an "unpublish" RPC to the server, which will send "detach"
RPCs to the node plugin. The node plugin unmounts the bind-mount from the
allocation and unmounts the volume from the plugin (if it's not in use by
another task). The server will then send "unpublish" RPCs to the controller
plugin (if any), and decrement the claim count for the volume. At this point
the volume’s claim capacity has been freed up for scheduling.

[csi-spec]: https://github.com/container-storage-interface/spec
[csi-drivers-list]: https://kubernetes-csi.github.io/docs/drivers.html
Expand Down
9 changes: 5 additions & 4 deletions website/pages/docs/job-specification/csi_plugin.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,11 @@ host. With the Docker task driver, you can use the `privileged = true`
configuration, but no other default task drivers currently have this
option.

~> **Note:** During node drains, jobs that claim volumes should be
moved before the `node` or `monolith` plugin for those
volumes. Because [`system`][system] jobs are moved last during node drains, you
should run `node` or `monolith` plugins as `system` jobs.
~> **Note:** During node drains, jobs that claim volumes must be moved before
the `node` or `monolith` plugin for those volumes. You should run `node` or
`monolith` plugins as [`system`][system] jobs and use the `-ignore-system`
flag on `nomad node drain` to ensure that the plugins are running while the
node is being drained.

## `csi_plugin` Examples

Expand Down