-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csi: fix plugin counts on node update #7844
Conversation
97871b9
to
8748752
Compare
d4b521f
to
81c0702
Compare
End-to-end tested. Note this doesn't include the final GC step we'll need to clean up multi-node plugins without a job purge, which will be under separate PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thanks for the commit by commit breakdown, super easy to read
nomad/structs/csi.go
Outdated
@@ -729,7 +731,9 @@ func (p *CSIPlugin) AddPlugin(nodeID string, info *CSIInfo) error { | |||
p.NodesHealthy -= 1 | |||
} | |||
} | |||
p.Nodes[nodeID] = info | |||
if prev != nil || prev == nil && info.Healthy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: isn't this the same as prev != nil || info.Healthy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Will fix.
Looks like I've got a merge conflict from the earlier merge with the docstring fixes. Will address the open item and that conflict and this should be good-to-go. |
The existing test didn't go back to the state store, which normally is ok but is complicated in this case by denormalization which changes the behavior. This commit makes the test more comprehensive.
All known controllers that support `PUBLISH_READONLY` also support `PUBLISH_UNPUBLISH_VOLUME` but we shouldn't assume this.
If a Nomad client node is running both a controller and a node plugin (which is a common case), then if only the controller or the node is removed, the plugin was not being updated with the correct counts.
Plugins are first created when a Nomad client sends a node update RPC that includes allocs with plugins. We use the same mechanism to update plugin health. But if we allow a plugin to be created for the first time when the alloc is not healthy, then we'll recreate deleted plugins when the job's allocs all get marked terminal. This changeset fixes that by only creating a plugin when the alloc is healthy.
When an allocation that implements a CSI plugin becomes terminal the client fingerprint can't tell if the plugin is unhealthy intentionally (for the case of updates or job stop). Allocations that are server-terminal should delete themselves from the plugin and trigger a plugin self-GC, the same as an unused node.
81c0702
to
e339037
Compare
Plugin health for controllers should show "Node Only" in the UI only when both conditions are true: controllers are not required, and no controllers have registered themselves (0 expected controllers). This accounts for "monolith" plugins which might register as both controllers and nodes but not necessarily have `ControllerRequired = true` because they don't implement the Controller RPC endpoints we need (this requirement was added in #7844) This changeset includes the following fixes: * Update the Plugins tab of the UI so that monolith plugins don't show "Node Only" once they've registered. * Add the missing "Node Only" logic to the Volumes tab of the UI.
…9416) Plugin health for controllers should show "Node Only" in the UI only when both conditions are true: controllers are not required, and no controllers have registered themselves (0 expected controllers). This accounts for "monolith" plugins which might register as both controllers and nodes but not necessarily have `ControllerRequired = true` because they don't implement the Controller RPC endpoints we need (this requirement was added in #7844) This changeset includes the following fixes: * Update the Plugins tab of the UI so that monolith plugins don't show "Node Only" once they've registered. * Add the missing "Node Only" logic to the Volumes tab of the UI.
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
For #7817 and #7743. Best reviewed commit-by-commit, but unfortunately all the changes are required to finish out this multi-layered bug. The bulk of the new lines of code are tests... it was a bit of a pain to suss-out this behavior!
In this changeset:
PUBLISH_READONLY
. All known controllers that supportPUBLISH_READONLY
also supportPUBLISH_UNPUBLISH_VOLUME
but we shouldn't assume this.server-terminal should delete themselves from the plugin and trigger a plugin self-GC, the same as an unused node.