-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Facilitate the creation of CSI Volumes within a Jobspec #11195
Comments
Hi @rigrassm, Thanks for the suggestion. Do I understand your request is that you want to be able to fully embed a volume spec with all its options inside a job spec? My first reaction to this is couldn't the volume provisioning take a bit of time? I feel like the purpose of the job spec is to register the desired state with Nomad as quickly as possible, and waiting for a volume provisioning step to finish works against that goal. I am wondering if a lifecycle task would meet your needs today. If nomad is available in your CI image, I suspect you could Thanks, Derek and the Nomad Team |
Edit: apologies for any spelling/grammar mistakes, wrote this out on mobile and hadn't originally intended to get long winded lol.
It definitely could take some time for the volume to be provisioned. The way I look at it is, in most cases, the volume's existence (and by extension it's creation) is a requirement of the job so it makes sense that the requirement be able to be fully declared within the job spec. I think it would be acceptable for the creation of the volume's to be done asynchronously from the scheduler so that the job would be registered but not placed until the volume finishes being created. Ideally the scheduler would be aware of this and reevaluate the job more frequently and be less aggressive with the eval time back off than it would with normal jobs.
Currently we are already just adding a step to our CI that does the volume registration. This works just fine but it is an additional pipeline step. I'm our case, we're using Cinder CSI without multiwrite capability so all of our volumes will always be a 1:1 mapping with their corresponding jobs. |
I understand. I'll forward this to the team for backlog consideration. In the meantime, I'm glad you've got an acceptable work around. Thanks again for using Nomad! @DerekStrickland and the Nomad Team |
I too would like this feature, but I'd also want the volume to be destroyed once the job is finished. I'm not sure if that was also @rigrassm's intention, but it's not clearly stated. |
@alexiri, I didn't state it but now that you mention it, having a destroy on job stop option would be handy |
cc @jrasell |
@rigrassm I also second this, especially for per_alloc volumes. It will be very handy to be able to create volumes like that instead of creating each volumes one by one especially when the difference is just the volume name (volume[0], volume[1], volume[2]). |
I'm doing some scoping of outstanding CSI issues and I wanted to drop a note about this item. In our design for the volume create feature we left this out intentionally because the impact on scheduler workflow was going to be a fairly large lift. Which isn't to say we're not going to do it, but just that it's not as trivial as it seems at first. Here's an excerpt from our design doc (NMD-086, for internal folks):
|
This might be a stupid question, but why does the volume have to be available before the job is scheduled? The way I would imagine this working is something like this:
If provisioning fails, the job fails and it would be retried according to the job specification. What's wrong with the approach? My worry with the "create volume before scheduling" approach would be that if I submit a lot of jobs that aren't all going to be run at once due to their scheduling restrictions (#11197), all the volumes will nevertheless be created at once even though they might not be necessary for several hours until their respective job actually starts. |
It's not a stupid question, it's a great question! But it gets into some scheduler internals. From a high-level view, when an evaluation is processed in the scheduler, we check that the job is "feasible" and we "compute placements" to generate the plan that pairs up allocations with client nodes. Importantly, the scheduler cannot write changes to state without handing the plan to the "plan applier" on the leader (this serializes the plans and it's how we guarantee consistency). For CSI volumes, we currently check in the
The last two checks are the troublesome ones to change because the check we can do at the scheduler becomes only eventually consistent. But as it turns out they're already eventually consistent because we drive the claim workflow from the client. And this is the source of integrity issues with cleaning up claims when we're done with the volumes (see #10833 for a discussion of that). So if we could fix the issue in #10833 we could probably drop the "the volume exists" check and turn it into a "if the volume exists, check for maxed-out / number of claims" check. And then we'd drive the entire volume create/mount flow from the client where the alloc is placed. There would be a few other ripple-effects to consider:
I don't think any of this is insurmountable, but it requires some design of the details. Hope this helps provide some context. |
Depending on the type of storage, it can be fast enough to justify the convenience and simplicity.
K8s has ephemeral volumes. Some Nomad users would probably like the ability to have the volume deleted once the job count hits zero. Others may want to leave it in place (for example if the volume is used for cache that would have to be regenerated the next time someone runs the same job).
Good point. There's a similar cautionary note in the K8s docs. |
Proposal
With the ability to create CSI volumes directly from Nomad now via the
nomad volume create
command, it would be helpful to be able to define a volume inline with the jobspec that would result in the creation of the volume upon job submission without the need for the separate action to explicitly create the volume.The basic building blocks necessary to facilitate this are already in place with the
job -> group -> volume
block if that block were extended to make use of the fullvolume
struct that is used bynomad volume create
command.Use-cases
Our main use-case for this feature would be for simplifying the job spec by keeping it all in a single file as well as simplifying CI/CD pipelines by having one less step to perform.
This would also allow for the
nomad job plan
process to include the volume creation step in planning and prevent the warning below from being generated.Attempted Solutions
None yet. This is just a process improvement that would combine two existing functionalities that can't be done without changes to nomad itself.
The text was updated successfully, but these errors were encountered: