Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System Scheduler use new Update stanza and Deployments #4740

Open
aaroncline opened this issue Oct 1, 2018 · 10 comments
Open

System Scheduler use new Update stanza and Deployments #4740

aaroncline opened this issue Oct 1, 2018 · 10 comments
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/deployments theme/system-scheduler type/enhancement

Comments

@aaroncline
Copy link

aaroncline commented Oct 1, 2018

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Nomad v0.8.4 (dbee1d7)

Operating system and Environment details

CentOS 7
Consul v1.0.7
fabiolb 1.5.6
3 nomad clients
3 nomad servers

Issue

When deploying Fabio using the system scheduler and the exec driver, Nomad does not seem to respect the Update section hierarchy between the job and group sections.

Also, it does not seem as though Nomad treats this as a "deployment". No deployment ID is available in the job submission evaluation.

Reproduction steps

Use the job file below to launch fabio into an environment. Alter the force_job_restart epoch ENV and redeploy and you should see all fabio executions stop at essentially the same time. There is also no deployment ID which is how we track successful deployments on our service scheduled tasks. If you then change the job Update section to match the Group Update section, the tasks will be staggered appropriately.

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Job file (if appropriate)

{
    "Job": {
        "AllAtOnce": false,
        "Constraints": null,
        "CreateIndex": 635411,
        "Datacenters": [
            "us-east-1"
        ],
        "Dispatched": false,
        "ID": "fabio",
        "JobModifyIndex": 855141,
        "Meta": null,
        "Migrate": null,
        "ModifyIndex": 855141,
        "Name": "fabio",
        "Namespace": "default",
        "ParameterizedJob": null,
        "ParentID": "",
        "Payload": null,
        "Periodic": null,
        "Priority": 50,
        "Region": "aws",
        "Reschedule": null,
        "Stable": false,
        "Status": "running",
        "StatusDescription": "",
        "Stop": false,
        "SubmitTime": 1538421216716838569,
        "TaskGroups": [
            {
                "Constraints": null,
                "Count": 1,
                "EphemeralDisk": {
                    "Migrate": false,
                    "SizeMB": 300,
                    "Sticky": false
                },
                "Meta": null,
                "Migrate": null,
                "Name": "devops",
                "ReschedulePolicy": null,
                "RestartPolicy": {
                    "Attempts": 2,
                    "Delay": 15000000000,
                    "Interval": 1800000000000,
                    "Mode": "fail"
                },
                "Tasks": [
                    {
                        "Artifacts": [
                            {
                                "GetterMode": "any",
                                "GetterOptions": {
                                    "checksum": "sha256:2dfe26aaa74b659a0e595654eb8f9247d947cbf652cbebe03fd8133c2851cb4a"
                                },
                                "GetterSource": "https://github.com/fabiolb/fabio/releases/download/v1.5.6/fabio-1.5.6-go1.9.2-linux_amd64",
                                "RelativeDest": "local/"
                            }
                        ],
                        "Config": {
                            "command": "fabio-1.5.6-go1.9.2-linux_amd64"
                        },
                        "Constraints": null,
                        "DispatchPayload": null,
                        "Driver": "exec",
                        "Env": {
                            "force_job_restart": "1538421216"
                        },
                        "KillSignal": "",
                        "KillTimeout": 5000000000,
                        "Leader": false,
                        "LogConfig": {
                            "MaxFileSizeMB": 10,
                            "MaxFiles": 10
                        },
                        "Meta": null,
                        "Name": "devops_fabio_exec",
                        "Resources": {
                            "CPU": 200,
                            "DiskMB": 0,
                            "IOPS": 0,
                            "MemoryMB": 512,
                            "Networks": [
                                {
                                    "CIDR": "",
                                    "Device": "",
                                    "DynamicPorts": null,
                                    "IP": "",
                                    "MBits": 10,
                                    "ReservedPorts": [
                                        {
                                            "Label": "fabio_9999",
                                            "Value": 9999
                                        },
                                        {
                                            "Label": "fabio_9998",
                                            "Value": 9998
                                        }
                                    ]
                                }
                            ]
                        },
                        "Services": [
                            {
                                "AddressMode": "auto",
                                "CanaryTags": null,
                                "CheckRestart": null,
                                "Checks": [
                                    {
                                        "AddressMode": "",
                                        "Args": null,
                                        "CheckRestart": null,
                                        "Command": "",
                                        "GRPCService": "",
                                        "GRPCUseTLS": false,
                                        "Header": null,
                                        "Id": "",
                                        "InitialStatus": "",
                                        "Interval": 10000000000,
                                        "Method": "",
                                        "Name": "service: \"fabio\" check",
                                        "Path": "",
                                        "PortLabel": "fabio_9999",
                                        "Protocol": "",
                                        "TLSSkipVerify": true,
                                        "Timeout": 5000000000,
                                        "Type": "tcp"
                                    }
                                ],
                                "Id": "",
                                "Name": "fabio",
                                "PortLabel": "fabio_9999",
                                "Tags": null
                            }
                        ],
                        "ShutdownDelay": 0,
                        "Templates": null,
                        "User": "",
                        "Vault": null
                    }
                ],
                "Update": {
                    "MaxParallel": 1,
                    "Stagger": 30000000000
                }
            }
        ],
        "Type": "system",
        "Update": {
            "MaxParallel": 0,
            "Stagger": 0
        },
        "VaultToken": "",
        "Version": 7
    }
}

@aaroncline
Copy link
Author

I misreported initially and have made some edits. This actually appears to be a bug in the hierarchy of the Group and Job Update stanza's. According to your docs, the Group stanza should have the higher precedence. https://www.nomadproject.io/docs/job-specification/update.html

@aaroncline aaroncline changed the title System Scheduler Does Not Rotate Deployments or provide Deployment ID System Scheduler Does Not Respect Update Stanza Hierarchy Oct 1, 2018
@dadgar
Copy link
Contributor

dadgar commented Oct 2, 2018

@aaroncline Hey Aaron,

The system job currently doesn't support the new update system using deployments. You can see the callout here: https://www.nomadproject.io/docs/job-specification/update.html

I am going to rename the issue to reflect this

@dadgar dadgar changed the title System Scheduler Does Not Respect Update Stanza Hierarchy System Scheduler use new Update stanza and Deployments Oct 2, 2018
@ricbartm
Copy link

Hello @dadgar . We have a use case where we want to deploy a custom job on every node of a pool of nodes distributed around the globe and we thought the system scheduler is the best fit for this use case. Nevertheless, given that the new deployment and deployment stanza configurations are not being honoured, we may need to workaround it by using the service scheduler, some job contraints to avoid multiple copies of same job deployed in the same node, and some automation to increase the overall job count number to match our cluster size if it grows or shrinks. This is doable, but far from ideal.

Said that, this issue has been opened long time ago and it had very few activity. So, is there any chance that you could share with me what the plans are of this? I'd like to set some expectations (even the answer is "we don't have plans for this") to be able to take the most informed decision about it.

Finally, a shout to other folks, but specially to @aaroncline to know how they finally workaround this issue for their use case.

@calavera
Copy link

@dadgar we're investigating using Nomad at Netlify for a large heterogeneous deployment. Solving this issue would help us tremendously to decide whether to use Nomad. Is there anything we can do to help it move forward? The documentation says that this will be fixed in "future releases", but it'd be great to know whether you have more specific plans to address it.

@schmichael
Copy link
Member

That's super exciting @calavera! nomadproject.io itself uses Netlify, so it would be exciting to be "self-hosted" in a way.

Unfortunately this feature is not planned for the upcoming 0.11.x or 0.12.0 releases. It is absolutely in our queue for prioritization after 0.12.0, but I don't want to make any promises at this time. Would it be possible to elaborate on your use case in case there's a workaround we could help provide?

I'll try to update this issue when it's prioritized on our roadmap.

@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Aug 24, 2020
shoenig added a commit that referenced this issue Oct 28, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
shoenig added a commit that referenced this issue Oct 30, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
shoenig added a commit that referenced this issue Nov 5, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
shoenig added a commit that referenced this issue Nov 9, 2020
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
notnoop pushed a commit that referenced this issue Jul 19, 2021
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
notnoop pushed a commit that referenced this issue Aug 2, 2021
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
notnoop pushed a commit that referenced this issue Aug 3, 2021
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.

Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.

As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).

Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.

Closes #2527
@apkrymov
Copy link

apkrymov commented Jul 14, 2022

@schmichael Any updates? We really need this feature
Our use case same as mentioned @ricbartm. We need to deploy service based on the host constraints, not on fixed count of replicas in cluster. So, system scheduler fits perfectly, but we can not control deployment process due to current Update stanza limitations.

@ebarriosjr
Copy link
Contributor

@schmichael any updates? We could also really use this feature.
Thanks!

@schmichael
Copy link
Member

Unfortunately no updates. Sorry for letting this slip. Definitely still something we want to do, but I don't want to keep overpromising and underdelivering on timelines. 😬

@axsuul
Copy link
Contributor

axsuul commented Oct 22, 2022

Same here, we make heavy use of system jobs and really need a way to do rolling updates for them.

@hyungjic
Copy link

@schmichael Any updates on this issue?😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/deployments theme/system-scheduler type/enhancement
Projects
None yet
Development

No branches or pull requests

10 participants