-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSI: allow for concurrent plugin allocations #12078
Conversation
The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops.
require.NotNil(t, result) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgfa29 I think I'm getting a false positive on the semgrep rule here: https://github.com/hashicorp/nomad/runs/5222371451?check_suite_focus=true Arguably these assertions aren't all that useful, so I wouldn't be broken-hearted if I just had to remove them. But any suggestions for a fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh interesting, I haven't thought of this pattern before. Feel free to leave the code as-is and ignore the check for now. I will try to adjust the rules to match this.
3ec15a3
to
9d01933
Compare
9d01933
to
5c73101
Compare
The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops. Update the data structure for the dynamic registry so that more recent allocations take over as the instance manager singleton, but we still preserve the previous running allocations so that restores work without racing.
5c73101
to
1b8eb1e
Compare
Multiple allocations can run on a client for the same plugin, even if only during updates. Provide each plugin task a unique path for the control socket so that the tasks don't interfere with each other.
e248ebb
to
0cfe94e
Compare
// TODO(tgross): add this TaskPath to the CSIPluginConfig as well | ||
TaskPath: "/local/csi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: this PR is already huge and adding to this field to the jobspec is both a minor change and a lot of code to review. I'm going to do it in a follow-on PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a silly question, but is there a use case where this would even need to be configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a very good use case. 😁
Suppose you had a plugin that required a configuration file that you created via template
block, we could conceivably collide with the destination
for that template. Ideally the user would just use a different destination of course. But obviously this is low-priority enough that it's not going to be a blocker at all for CSI GA.
// TODO(tgross): https://github.com/hashicorp/nomad/issues/11786 | ||
// If we're already registered, we should be able to update the | ||
// definition in the update hook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: this is a "blocking GA" issue so I'll have a follow-up PR on that as well.
require.NoError(t, db.Close()) | ||
}) | ||
} | ||
|
||
t.Run("testdata/state-1.2.6.db.gz", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: this file got created from the following test code on a 1.2.6 branch:
TestBoltStateDB_WriteTestData
func TestBoltStateDB_WriteTestData(t *testing.T) {
os.Mkdir("newdata", 0775)
dbI, err := NewBoltStateDB(testlog.HCLogger(t), "newdata")
require.NoError(t, err)
db := dbI.(*BoltStateDB)
defer db.Close()
ps := &dynamicplugins.RegistryState{
Plugins: map[string]map[string]*dynamicplugins.PluginInfo{
"csi-controller": {
"plugin-1": {
Name: "plugin-1",
Type: string(structs.CSIPluginTypeMonolith),
Version: "v0.1",
ConnectionInfo: &dynamicplugins.PluginConnectionInfo{
SocketPath: "/var/nomad/client/csi/plugin-1/csi.sock",
},
AllocID: "489cd5cf-3cfc-6792-3465-822371765b32",
Options: map[string]string{
"Provider": "demo",
"MountPoint": "/var/nomad/client/csi/plugin-1",
"ContainerMountPoint": "/csi",
},
},
},
"csi-node": {
"plugin-1": {
Name: "plugin-1",
Type: string(structs.CSIPluginTypeMonolith),
Version: "v0.1",
ConnectionInfo: &dynamicplugins.PluginConnectionInfo{
SocketPath: "/var/nomad/client/csi/plugin-1/csi.sock",
},
AllocID: "489cd5cf-3cfc-6792-3465-822371765b32",
Options: map[string]string{
"Provider": "demo",
"mountpoint": "/var/nomad/client/csi/plugin-1",
"ContainerMountPoint": "/csi",
},
},
"plugin-2": {
Name: "plugin-2",
Type: string(structs.CSIPluginTypeNode),
Version: "v0.2",
ConnectionInfo: &dynamicplugins.PluginConnectionInfo{
SocketPath: "/var/nomad/client/csi/plugin-2/csi.sock",
},
AllocID: "68221094-25a6-eeb6-8490-c71a03d46b0f",
Options: map[string]string{
"Provider": "other-demo",
"MountPoint": "/var/nomad/client/csi/plugin-2",
"ContainerMountPoint": "/csi",
},
},
},
},
}
db.PutDynamicPluginRegistryState(ps)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments (mostly questions), but LGTM.
Thanks for breaking down the work into commits 🙂
// TODO(tgross): add this TaskPath to the CSIPluginConfig as well | ||
TaskPath: "/local/csi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a silly question, but is there a use case where this would even need to be configurable?
"ContainerMountPoint": h.task.CSIPluginConfig.MountDir, | ||
"ContainerMountPoint": "/local/csi", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this value change depending on the file system isolation capability of the task driver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A task driver without filesystem isolation never creates the mount, so it's a no-op. (Note this is an exotic case because the spec requires that plugins are shipped as OCI containers)
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Fixes #8628 #11784
The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops.
This is a fairly huge PR but none of it ships safely without the rest, so I've cut it up into hopefully-reviewable chunks.
TODO:
csi.sock
to somewhere owned by the allocation. Done in 1f095de