Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch System Jobs #2527

Closed
labria opened this issue Apr 6, 2017 · 30 comments · Fixed by #9160
Closed

Batch System Jobs #2527

labria opened this issue Apr 6, 2017 · 30 comments · Fixed by #9160
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/scheduling type/enhancement

Comments

@labria
Copy link

labria commented Apr 6, 2017

Currently, for running short living jobs, there is the batch scheduler, and for cluster-wide jobs (one per node) there is the service scheduler.
But sometimes you may want a short living job (that exits fast, not running any daemon) that is propagated across all the nodes. One example would be a job that changes some kind of config on the nodes.
Would it be possible to either support this kind of jobs in the system scheduler, or allow the batch scheduler to be run on all nodes?

Thanks

@dadgar dadgar changed the title Allow deploying short-living system jobs. Batch System Jobs Apr 6, 2017
@dadgar
Copy link
Contributor

dadgar commented Apr 6, 2017

Hey I updated the title and yes this is on the roadmap but timeline wise it may be a bit further out there.

@siffiejoe
Copy link

Consul has an exec command that may help (just in case you are using Consul).

@labria
Copy link
Author

labria commented Apr 16, 2017

@dadgar thanks!
@siffiejoe yes, I know, but in my case I actually need to deploy and run a docker container, albeit a short-living one.

@shantanugadgil
Copy link
Contributor

My use case is to update DNS entries when a certain service starts.
For now I am running the DNS update as separate job altogether.

@ketzacoatl
Copy link
Contributor

Mind sharing a use case? We get this request from time-to-time, but I believe most people solve it using existing configuration management solutions.

We would love to use nomad to replace our use of salt-master and other CM tools, but it's difficult to do this without a batch-system job, or some way of running a one-off job. To be clear, using nomad is so great, I'd like to have it handle all of my "run X on Y" problems.

@dadgar / @schmichael - does this require a new job type, or fit within the existing schedulers? is this relatively easy to implement, possibly a good task for someone wanting to learn about go, or too difficult?

@dadgar
Copy link
Contributor

dadgar commented Jan 22, 2018

@ketzacoatl It would have to be a new scheduler type. So this is a pretty significant amount of work. It is also unclear what the expectation is if the node doesn't have enough resources to run the given job? Would you want it to keep trying to place forever? What if the node is fully filled with service jobs that don't migrate often (once a month)? Does the system batch job try to keep placing it for that entire duration?

@ketzacoatl
Copy link
Contributor

@dadgar ah, ok.. I did not realize this would be more like a new scheduler. I guess that makes the design process a little easier.

I like the questions you've brought up, and I think there are probably others to ask, but to start, here are a few answers and details on use-cases I am targeting.

In nomad's world, a "batch system job" could be reworded as "run this thing once, on all hosts". A close variant is "run this thing periodically, on all hosts" ("periodic batch system job", and I think there's another ticket for that, I'll update this comment if I find it again). I'm not sure if the periodic and run-once batch-system jobs should be one scheduler type, but I think it's worth considering.

In my world, I want to "run X across the fleet(s) when Y happens". X is sometimes "apply a patch" or "update all users on the host", or similar. These actions are boxed up as CM formula, and otherwise applied with a single shell command / script / etc. Y might be something an operator is involved with, eg a security event, a new user joining the team, etc, or Y might be a completely automated reaction to some other happening in the deployment (eg consul event, change in catalog, lost instance, etc), or could just be a periodic task to run ("apply all formula, just make sure you are good and clean", or some scanning task).

ATM, in these scenarios, I am using a combination of consul and CM tooling. While it is functional and "works", there are a lot of administrative annoyances that would be addressed by nomad's features (for example, checking job status, logs, stats, resource reservation and allocation, the UI, etc).

For example, in the past, I've configured consul with a watch on some stuff in the catalog, and triggering CM tooling to "apply some formula" (eg if a "users" config changed, apply the "users" formula, or similar). It works well, and lets me manage a lot of systems through git with very little overhead, but using nomad for this would provide numerous advantages and assurances.

It's also worth noting that some of these tasks are "run on all hosts", while others are "targeting specific hosts", but I would expect to use constraints for that targeting.

Integrating with Consul, and servicing automated reactions, a batch system job could be submitted which "runs whenever X changes in the Consul catalog". EG, nomad monitors consul and runs a new instance of the batch job when the watch triggers.

Here are a few more specific answers to your questions:

First, I think those details are worked out in the job spec. For example, I could say "run-once, optimistically, and fail-fast, and tell me which nodes/tasks failed where" via parameters such as "fail-fast" (don't keep retrying, if a node isn't able to run the task, just tell me), or "retry based on X policy".

If a node does not have enough resources, the job operator could configure the job to fail the first time, or to continue to retry the task until it were possible, or retry for some amount of time X. Another option would be to allow for "draining a node" first, if there are not enough resources (which looks a bit like a rolling deploy).

Overall, I think the goal here is to provider operators with a means of running one-off (and periodic) tasks, across all hosts, and to make it reasonably easy for an operator to see what happened (which failed, which do I need to poke at some more).

@lmayorga1980
Copy link

👍 Hope this feature is implemented soon.

@eigengrau
Copy link

Chiming in with our desired use-case: running docker-gc on schedule.

@bdossantos
Copy link

Hope this feature is implemented soon. My use-case: host backups, warming-up nodes with recent docker images

@tommyalatalo
Copy link

I'm also looking forward to this feature, cluster-wide cleanup jobs would benefit greatly from a 'system batch' job type

@the-maldridge
Copy link

the-maldridge commented Jun 10, 2019

Adding a use case here. I realized recently that I needed to cleanup /var/lib/docker/overlay2 more aggressively than I had previously thought. In my ideal world, I would do this as part of a system batch task that runs once a week and GC's that directory.

The root cause is actually that my images all include a particular data file, which I could remove if I had a way of ensuring that it was always present on the host and within 2 days of its release timestamp, again something that if I had a batch job available I could do.

For both of these I believe batch-periodic is what I would want, but I could also just have another task which is a batch task that deploys the system batch job on a schedule. I would consider this to be a defect in nomad, but given that it already has a periodic scheduler, it would be a decent workaround.

@nirmalkq
Copy link

Another use case may be for a patching activity we want to perform on nomad nodes in large cluster.

@shantanugadgil
Copy link
Contributor

shantanugadgil commented Sep 29, 2019

My use case for this is to do a "yum update".

I achieved this by having a simple system job. The shell script is in a while 1 loop with a large wait of 24 hours.

@zbliujia
Copy link

My use case for this is to do a "brew install".

@perrymanuk
Copy link

Would like to have this for package maintenance on the nomad nodes

@kcajf
Copy link

kcajf commented Nov 14, 2019

This would be very useful for little cleanup jobs, and file transfers

@israellot
Copy link

+1 on this feature. Updating packages would be much easier.

@shantanugadgil
Copy link
Contributor

adding a "+1" to the first post would be beneficial ... that's how HashiCorp tracks interest in a ticket/feature .... (AFAIK)

@kasimon
Copy link

kasimon commented Jun 15, 2020

Our use case would be launching periodic backups on all servers. We use borg backup, which is strictly run from the client, and being able to create a periodic bach-system backup job would create a better overview over this process.

@pySilver
Copy link

My use case is to register nodes in some external monitoring app. Since there is no lifecycle that can be executed with hook="poststart" I need a batch job that can be executed cluster-wide.

@yishan-lin
Copy link
Contributor

yishan-lin commented Jun 29, 2020

Coming soon - on our roadmap!

@the-maldridge
Copy link

Amazing! This will clean up so much stuff in the Hashistack use cases.

@shantanugadgil
Copy link
Contributor

@yishan-lin as this is more batch oriented, would "wait for" semantics be also part of the feature?
(like Jenkins' "wait for the other job/group/task to finish, before proceeding?)

@yishan-lin
Copy link
Contributor

yishan-lin commented Jun 29, 2020

That's a good question. Ideally, the lifecycle hooks in my mind (e.g PostStart and PostStop hooks coming in the next month with our 0.12.X patches) would help address the "wait for" semantics with this across all schedulers, not just system/batch.

Would love to continue hearing thoughts and use-cases from all on this in detail, as that'd greatly help our design for this feature and ensure we build for everyone's immediate success (which should be coming quite soon). We'd be looking to address this and #4740, #4072, and #4267 in one fell swoop - not in the same feature of course, but in the same timeframe.

@shantanugadgil
Copy link
Contributor

shantanugadgil commented Jun 30, 2020

My over-arching thought/ideas are basically coming from a Jenkins Pipeline thought process with Git plugin.

A full featured git plugin (feature equivalent of the Jenkins plugin) would be an super excellent addition.
(I wonder if a git plugin can be implemented using Nomad's plugin sdk 😁 )

for now I am (sort of) making do with the git clean, fetch, pull, whatever commands using a raw_exec job and shell script.
having a much enhanced declarative syntax for the repo clone/update (artifact) could also be an intermediate solution, maybe ?!

@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Aug 24, 2020
@shoenig shoenig self-assigned this Oct 9, 2020
shoenig added a commit that referenced this issue Oct 22, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility, rolling updates, and preemption are governed the same as
with system jobs.

Closes #2527
shoenig added a commit that referenced this issue Oct 26, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility, rolling updates, and preemption are governed the same as
with system jobs.

Closes #2527
shoenig added a commit that referenced this issue Oct 28, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
shoenig added a commit that referenced this issue Oct 30, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
shoenig added a commit that referenced this issue Nov 5, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
shoenig added a commit that referenced this issue Nov 9, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
@Ramesh7
Copy link

Ramesh7 commented Nov 23, 2020

Hi @yishan-lin, I see sysbatch is getting tracked here, is it going to land with 1.0.0 GA or in some minor versions?

@josegonzalez
Copy link
Contributor

Maybe related is #1944.

@shoenig shoenig removed their assignment Jun 15, 2021
notnoop pushed a commit that referenced this issue Jul 19, 2021
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
notnoop pushed a commit that referenced this issue Aug 2, 2021
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
notnoop pushed a commit that referenced this issue Aug 3, 2021
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
@ketzacoatl
Copy link
Contributor

Yahoo! This is awesome, thank you!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/scheduling type/enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.