Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI: track node claim before staging to prevent interleaved unstage #20550

Merged
merged 1 commit into from
May 16, 2024

Conversation

tgross
Copy link
Member

@tgross tgross commented May 9, 2024

The CSI hook for each allocation that claims a volume runs concurrently. If a call to MountVolume happens at the same time as a call to UnmountVolume for the same volume, it's possible for the second alloc to detect the volume has already been staged, then for the original alloc to unpublish and unstage it, only for the second alloc to then attempt to publish a volume that's been unstaged.

The usage tracker on the volume manager was intended to prevent this behavior but the call to claim the volume was made only after staging and publishing was complete. Move the call to claim the volume for the usage tracker to the top of the MountVolume workflow to prevent it from being unstaged until all consuming allocations have called UnmountVolume.

Fixes: #20424


Note this PR currently contains everything in #20532 because they'd otherwise conflict. Once that's been merged I'll rebase on main and take this out of draft. The actual changes are in c01986b Done

The CSI hook for each allocation that claims a volume runs concurrently. If a
call to `MountVolume` happens at the same time as a call to `UnmountVolume` for
the same volume, it's possible for the second alloc to detect the volume has
already been staged, then for the original alloc to unpublish and unstage it,
only for the second alloc to then attempt to publish a volume that's been
unstaged.

The usage tracker on the volume manager was intended to prevent this behavior
but the call to claim the volume was made only after staging and publishing was
complete. Move the call to claim the volume for the usage tracker to the top of
the `MountVolume` workflow to prevent it from being unstaged until all consuming
allocations have called `UnmountVolume`.

Fixes: #20424
Copy link
Contributor

@pkazmierczak pkazmierczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link

github-actions bot commented Jan 8, 2025

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.6.x backport to 1.6.x release line backport/1.7.x backport to 1.7.x release line hcc/cst Admin - internal theme/storage type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition when a CSI volume with staging gets unmounted from one alloc and mounted to another alloc
2 participants