-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow per_alloc to be used with host volumes #15780
Conversation
Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder. When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across restarts we need to make sure allocation placement is consistent. With CSI volumes we can use the `per_alloc` feature but for some reason this is explicitly disabled for host volumes. I couldn't find a reason why this done since as far as I could see the per_alloc is only used for the scheduler to determine placement.
@tgross hope you don't mind me pinging you here, but this seems to be written by you. Do you happen to remember if there was an explicit reason to disable |
Mostly because we didn't think anyone would want it because it creates tight coupling between cluster administrator config and job operator config. You'd end up with a client configuration like the following: client {
host_volume "host_data[0]" {
path = "/srv"
}
} And then a jobspec with the following: job "docs" {
group "example" {
volume "certs" {
type = "host"
source = "host_data"
per_alloc = true
}
}
} Which I guess can work but doesn't that forever pin the workload to a specific client (or at least the subset of clients with that same index on the host volume config)? |
In many cases you wouldn’t need to pin an allocation to a specific node but there are some. Kafka for example, could benefit from this. When you deploy a job config with a broker.id rendered from the NOMAD_ALLOC_INDEX you need to ensure the data matches that specific allocation to deploy a consistent config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ygersie! Ok, I think I understand the use case and that sounds reasonable. Hopefully #15489 will fill some of these gaps but that's a ways out yet.
This change isn't enough to expose per-alloc semantics for host volumes though:
- We'll need to reproduce the per-alloc logic currently in the scheduler for CSI volumes to the host volumes. Look in
scheduler/feasible.go
for that. - We'll need to reproduce the client-side logic we currently have in
csi_hook.go
for creating the per-alloc name into the task runner'svolume_hook.go
.
Right now the validation tests don't pass either. We should probably update the docs as well.
@tgross I took a stab at it, please roast me where appropriate ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ygersie! The code looks good, and I pulled it down locally to try it out and everything seems to work as expected.
Two additional bits:
- Documentation updates: we'll need to move the
per_alloc
in the docs into the section above with the rest of the fields that work for both CSI and Host volumes. You can find that involume.mdx
. We'll also need to update theupdate.canary
description. - Can you run
make cl
to add a changelog entry?
Once that's done we can get this merged and it'll ship in the upcoming Nomad 1.5.0.
@tgross thanks for all the pointers, much appreciated! Should be good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @ygersie!
Disallowing per_alloc for host volumes in some cases makes life of a nomad user much harder. When we rely on the NOMAD_ALLOC_INDEX for any configuration that needs to be re-used across restarts we need to make sure allocation placement is consistent. With CSI volumes we can use the
per_alloc
feature but for some reason this is explicitly disabled for host volumes.I couldn't find a reason why this done, since as far as I could see the per_alloc is only used for the scheduler to determine placement.