Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI: node unmount from the client before unpublish RPC #11892

Merged
merged 2 commits into from
Jan 28, 2022

Conversation

tgross
Copy link
Member

@tgross tgross commented Jan 20, 2022

Partial fix for #10927 #10052 #10833
Requires #11890 #11932

When an allocation stops, the csi_hook makes an unpublish RPC to the
servers to unpublish via the CSI RPCs: first to the node plugins and
then the controller plugins. The controller RPCs must happen after the
node RPCs so that the node has had a chance to unmount the volume
before the controller tries to detach the associated device.

But the client has local access to the node plugins and can
independently determine if it's safe to send unpublish RPC to those
plugins. This will allow the server to treat the node plugin as
abandoned if a client is disconnected and stop_on_client_disconnect
is set. This will let the server try to send unpublish RPCs to the
controller plugins, under the assumption that the client will be
trying to unmount the volume on its end first.

@tgross tgross self-assigned this Jan 20, 2022
@tgross tgross changed the title csi: node unmount from the client before unpublish RPC CSI: node unmount from the client before unpublish RPC Jan 20, 2022
@tgross tgross added this to the 1.2.5 milestone Jan 24, 2022
@tgross tgross force-pushed the csi-client-side-node-unmount branch from c28c1bd to 28ed20d Compare January 25, 2022 16:55
@vercel vercel bot temporarily deployed to Preview – nomad January 25, 2022 16:55 Inactive
@tgross tgross marked this pull request as ready for review January 26, 2022 20:57
@tgross tgross requested review from lgfa29, shoenig and jrasell January 26, 2022 20:57
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an inline question, but I sense this is a knowledge gap. Therefore 👍🏻

client/allocrunner/csi_hook.go Outdated Show resolved Hide resolved
Comment on lines 111 to 113
if err != nil {
mErr = multierror.Append(mErr, err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if this was covered yesterday, but for my education, if we fail to unmount the volume, can attempting to unpublish the volume succeed or will it always fail? I am curious how the fall-through when an error is received works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. There are two cases:

  • We may be returning an error that's structs.ErrCSIClientRPCIgnorable. For example, this CSI RPC is supposed to be idempotent so if it's been node-unpublished already we still want to make the CSIVolume.Unpublish RPC back to the server so that the server can do a controller-unpublish and release the claim.
  • If it's not ignorable (ex. the node plugin's alloc is gone), we should probably not send the CSIVolume.Unpublish because there's nothing we can do from the server that will improve things! Instead we should be retrying until it works with exponential backoff, and logging that so that the operator can intervene manually. That will cause alloc shutdown to hang, which I think is exactly what we want here.

That being said, I think we should push the "ignorable" errors down into the UnmountVolume so that we only ever return non-ignorable errors. Then this code can just have a happy path and a path where errors need to be retried and block this method from returning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrasell I've addressed this in 04f5b7a:

The CSI NodeUnpublishVolume/NodeUnstageVolume RPCs can return
ignorable errors in the case where the volume has already been
unmounted from the node. Handle all other errors by retrying until we
get success so as to give operators the opportunity to reschedule a
failed node plugin (ex. in the case where they accidentally drained a
node without -ignore-system). Fan-out the work for each volume into
its own goroutine so that we can release a subset of volumes if only
one is stuck.

When an allocation stops, the `csi_hook` makes an unpublish RPC to the
servers to unpublish via the CSI RPCs: first to the node plugins and
then the controller plugins. The controller RPCs must happen after the
node RPCs so that the node has had a chance to unmount the volume
before the controller tries to detach the associated device.

But the client has local access to the node plugins and can
independently determine if it's safe to send unpublish RPC to those
plugins. This will allow the server to treat the node plugin as
abandoned if a client is disconnected and `stop_on_client_disconnect`
is set. This will let the server try to send unpublish RPCs to the
controller plugins, under the assumption that the client will be
trying to unmount the volume on its end first.
The CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return
ignorable errors in the case where the volume has already been
unmounted from the node. Handle all other errors by retrying until we
get success so as to give operators the opportunity to reschedule a
failed node plugin (ex. in the case where they accidentally drained a
node without `-ignore-system`). Fan-out the work for each volume into
its own goroutine so that we can release a subset of volumes if only
one is stuck.
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks awesome!

@tgross tgross merged commit 8364eda into main Jan 28, 2022
@tgross tgross deleted the csi-client-side-node-unmount branch January 28, 2022 13:30
tgross added a commit that referenced this pull request Jan 28, 2022
When an allocation stops, the `csi_hook` makes an unpublish RPC to the
servers to unpublish via the CSI RPCs: first to the node plugins and
then the controller plugins. The controller RPCs must happen after the
node RPCs so that the node has had a chance to unmount the volume
before the controller tries to detach the associated device.

But the client has local access to the node plugins and can
independently determine if it's safe to send unpublish RPC to those
plugins. This will allow the server to treat the node plugin as
abandoned if a client is disconnected and `stop_on_client_disconnect`
is set. This will let the server try to send unpublish RPCs to the
controller plugins, under the assumption that the client will be
trying to unmount the volume on its end first.

Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can 
return ignorable errors in the case where the volume has already been
unmounted from the node. Handle all other errors by retrying until we
get success so as to give operators the opportunity to reschedule a
failed node plugin (ex. in the case where they accidentally drained a
node without `-ignore-system`). Fan-out the work for each volume into
its own goroutine so that we can release a subset of volumes if only
one is stuck.
tgross added a commit that referenced this pull request Jan 28, 2022
tgross added a commit that referenced this pull request Jan 28, 2022
tgross added a commit that referenced this pull request Jan 28, 2022
tgross added a commit that referenced this pull request Jan 28, 2022
When an allocation stops, the `csi_hook` makes an unpublish RPC to the
servers to unpublish via the CSI RPCs: first to the node plugins and
then the controller plugins. The controller RPCs must happen after the
node RPCs so that the node has had a chance to unmount the volume
before the controller tries to detach the associated device.

But the client has local access to the node plugins and can
independently determine if it's safe to send unpublish RPC to those
plugins. This will allow the server to treat the node plugin as
abandoned if a client is disconnected and `stop_on_client_disconnect`
is set. This will let the server try to send unpublish RPCs to the
controller plugins, under the assumption that the client will be
trying to unmount the volume on its end first.

Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can 
return ignorable errors in the case where the volume has already been
unmounted from the node. Handle all other errors by retrying until we
get success so as to give operators the opportunity to reschedule a
failed node plugin (ex. in the case where they accidentally drained a
node without `-ignore-system`). Fan-out the work for each volume into
its own goroutine so that we can release a subset of volumes if only
one is stuck.
tgross added a commit that referenced this pull request Jan 28, 2022
tgross added a commit that referenced this pull request Jan 28, 2022
When an allocation stops, the `csi_hook` makes an unpublish RPC to the
servers to unpublish via the CSI RPCs: first to the node plugins and
then the controller plugins. The controller RPCs must happen after the
node RPCs so that the node has had a chance to unmount the volume
before the controller tries to detach the associated device.

But the client has local access to the node plugins and can
independently determine if it's safe to send unpublish RPC to those
plugins. This will allow the server to treat the node plugin as
abandoned if a client is disconnected and `stop_on_client_disconnect`
is set. This will let the server try to send unpublish RPCs to the
controller plugins, under the assumption that the client will be
trying to unmount the volume on its end first.

Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can
return ignorable errors in the case where the volume has already been
unmounted from the node. Handle all other errors by retrying until we
get success so as to give operators the opportunity to reschedule a
failed node plugin (ex. in the case where they accidentally drained a
node without `-ignore-system`). Fan-out the work for each volume into
its own goroutine so that we can release a subset of volumes if only
one is stuck.
tgross added a commit that referenced this pull request Jan 28, 2022
tgross added a commit that referenced this pull request Feb 22, 2022
In PR #11892 we updated the `csi_hook` to unmount the volume locally
via the CSI node RPCs before releasing the claim from the server. The
timer for this hook was initialized with the retry time, forcing us to
wait 1s before making the first unmount RPC calls.

Use the new helper for timers to ensure we clean up the timer nicely.
tgross added a commit that referenced this pull request Feb 22, 2022
In PR #11892 we updated the `csi_hook` to unmount the volume locally
via the CSI node RPCs before releasing the claim from the server. The
timer for this hook was initialized with the retry time, forcing us to
wait 1s before making the first unmount RPC calls.

Use the new helper for timers to ensure we clean up the timer nicely.
tgross added a commit that referenced this pull request Apr 5, 2022
When we unmount a volume we need to be able to recover from cases
where the plugin has been shutdown before the allocation that needs
it, so in #11892 we blocked shutting down the alloc runner hook. But
this blocks client shutdown if we're in the middle of unmounting. The
client won't be able to communicate with the plugin or send the
unpublish RPC anyways, so we should cancel the context and assume that
we'll resume the unmounting process when the client restarts.

For `-dev` mode we don't send the graceful `Shutdown()` method and
instead destroy all the allocations. In this case, we'll never be able
to communicate with the plugin but also never close the context we
need to prevent the hook from blocking. To fix this, move the retries
into their own goroutine that doesn't block the main `Postrun`.
tgross added a commit that referenced this pull request Apr 5, 2022
When we unmount a volume we need to be able to recover from cases
where the plugin has been shutdown before the allocation that needs
it, so in #11892 we blocked shutting down the alloc runner hook. But
this blocks client shutdown if we're in the middle of unmounting. The
client won't be able to communicate with the plugin or send the
unpublish RPC anyways, so we should cancel the context and assume that
we'll resume the unmounting process when the client restarts.

For `-dev` mode we don't send the graceful `Shutdown()` method and
instead destroy all the allocations. In this case, we'll never be able
to communicate with the plugin but also never close the context we
need to prevent the hook from blocking. To fix this, move the retries
into their own goroutine that doesn't block the main `Postrun`.
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants