Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of CSI: set mounts in alloc hook resources atomically into release/1.4.x #16773

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #16722 to be assessed for backporting due to the inclusion of the label backport/1.4.x.

The below text is copied from the body of the original PR.


Fixes #16623

The allocrunner has a facility for passing data written by allocrunner hooks to taskrunner hooks. Currently the only consumers of this facility are the allocrunner CSI hook (which writes data) and the taskrunner volume hook (which reads that same data).

The allocrunner hook for CSI volumes doesn't set the alloc hook resources atomically. Instead, it gets the current resources and then writes a new version back. Because the CSI hook is currently the only writer and all readers happen long afterwards, this should be safe but #16623 shows there's some sequence of events during restore where this breaks down.

Refactor hook resources so that hook data is accessed via setters and getters that hold the mutex, and ensure the object is instantiated synchronously at the time the AllocRunner is created.


Note to reviewers: the reproduction for the crash is extremely complicated and timing dependent, so we're not going to be able to test this in a unit test. See my comment at #16623 (comment) for how this has been manually tested. There's some follow-up work we're going to need to do in #16746

@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/alloc-hook-resources/exactly-magnetic-jaguar branch from d20d16b to b0ecd44 Compare April 3, 2023 15:04
@hc-github-team-nomad-core hc-github-team-nomad-core merged commit b169e8d into release/1.4.x Apr 3, 2023
@hc-github-team-nomad-core hc-github-team-nomad-core force-pushed the backport/alloc-hook-resources/exactly-magnetic-jaguar branch from f99517c to 0db70d6 Compare April 3, 2023 15:04
@hc-github-team-nomad-core hc-github-team-nomad-core deleted the backport/alloc-hook-resources/exactly-magnetic-jaguar branch April 3, 2023 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants