-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plan returns an erroneous in-place update in diff when task group has a constraint #10836
Comments
I just tried moving constraint up to the job instead of the task group and it fixes this behavior. For me, that's a decent workaround since these jobs only have a single task group anyway. |
Summary: Having the constraints on the task group causes Nomad to always think there's a diff on these jobs, even when nothing has changed. I filed hashicorp/nomad#10836 about that, but in the meantime, this works around the issue. Test Plan: I installed these apps from Nomadic locally, then installed them again and saw that it did not try to submit the jobs again. Reviewers: matt Reviewed By: matt Differential Revision: https://code.home.mattmoriarity.com/D20
Hi @mjm, thanks for reporting! So far I haven't been able to reproduce what you're seeing - but it might just not be the group constraint that's actually the problem? It might be helpful if you could post the output from the CLI when running Here's the job I'm submitting: job "example" {
datacenters = ["dc1"]
group "sleep" {
constraint {
operator = "="
attribute = "${node.unique.name}"
value = "laptop"
}
task "sleep" {
driver = "exec"
config {
command = "/bin/sleep"
args = ["100"]
}
}
}
} |
Thanks for looking at this! Here's the JSON version of one of these jobs. I have some code that does some transformations on the original HCL, so this JSON version is what actually gets applied. I have to believe that the constraint is somehow relevant (maybe it's not the whole story) because moving it up to the job does fix the problem. {
"Region": null,
"Namespace": null,
"ID": "tripplite-exporter",
"Name": "tripplite-exporter",
"Type": "system",
"Priority": 70,
"AllAtOnce": null,
"Datacenters": [
"dc1"
],
"Constraints": null,
"Affinities": null,
"TaskGroups": [
{
"Name": "tripplite-exporter",
"Count": 1,
"Constraints": [
{
"LTarget": "${node.unique.name}",
"RTarget": "raspberrypi",
"Operand": "="
}
],
"Affinities": null,
"Tasks": [
{
"Name": "tripplite-exporter",
"Driver": "docker",
"User": "",
"Lifecycle": null,
"Config": {
"command": "/tripplite_exporter",
"image": "index.docker.io/mmoriarity/tripplite-exporter@sha256:c955272aa83f9eccfe461a8b96ef8f299e13b3cb71a7a7bcad5db6376d27ace6",
"logging": {
"config": [
{
"tag": "tripplite-exporter"
}
],
"type": "journald"
},
"mount": [
{
"source": "/dev/bus/usb",
"target": "/dev/bus/usb",
"type": "bind"
}
],
"ports": [
"http"
],
"privileged": true
},
"Constraints": null,
"Affinities": null,
"Env": {
"HOSTNAME": "${attr.unique.hostname}",
"HOST_IP": "${attr.unique.network.ip-address}",
"NOMAD_CLIENT_ID": "${node.unique.id}"
},
"Services": null,
"Resources": {
"CPU": 30,
"MemoryMB": 30,
"DiskMB": null,
"Networks": null,
"Devices": null,
"IOPS": null
},
"RestartPolicy": null,
"Meta": null,
"KillTimeout": null,
"LogConfig": null,
"Artifacts": null,
"Vault": null,
"Templates": null,
"DispatchPayload": null,
"VolumeMounts": null,
"Leader": false,
"ShutdownDelay": 0,
"KillSignal": "",
"Kind": "",
"ScalingPolicies": null
}
],
"Spreads": null,
"Volumes": null,
"RestartPolicy": null,
"ReschedulePolicy": null,
"EphemeralDisk": null,
"Update": null,
"Migrate": null,
"Networks": [
{
"Mode": "",
"Device": "",
"CIDR": "",
"IP": "",
"DNS": {
"Servers": [
"10.0.2.101"
],
"Searches": null,
"Options": null
},
"ReservedPorts": null,
"DynamicPorts": [
{
"Label": "http",
"Value": 0,
"To": 8080,
"HostNetwork": ""
}
],
"MBits": null
}
],
"Meta": null,
"Services": [
{
"Id": "",
"Name": "tripplite-exporter",
"Tags": null,
"CanaryTags": null,
"EnableTagOverride": false,
"PortLabel": "http",
"AddressMode": "",
"Checks": [
{
"Id": "",
"Name": "",
"Type": "http",
"Command": "",
"Args": null,
"Path": "/healthz",
"Protocol": "",
"PortLabel": "",
"Expose": false,
"AddressMode": "",
"Interval": 30000000000,
"Timeout": 5000000000,
"InitialStatus": "",
"TLSSkipVerify": false,
"Header": null,
"Method": "",
"CheckRestart": null,
"GRPCService": "",
"GRPCUseTLS": false,
"TaskName": "",
"SuccessBeforePassing": 3,
"FailuresBeforeCritical": 0
}
],
"CheckRestart": null,
"Connect": null,
"Meta": {
"metrics_path": "/metrics"
},
"CanaryMeta": null,
"TaskName": ""
}
],
"ShutdownDelay": null,
"StopAfterClientDisconnect": null,
"Scaling": null
}
],
"Update": null,
"Multiregion": null,
"Spreads": null,
"Periodic": null,
"ParameterizedJob": null,
"Reschedule": null,
"Migrate": null,
"Meta": null,
"ConsulToken": null,
"VaultToken": null,
"Stop": null,
"ParentID": null,
"Dispatched": false,
"Payload": null,
"VaultNamespace": null,
"NomadTokenID": null,
"Status": null,
"StatusDescription": null,
"Stable": null,
"Version": null,
"SubmitTime": null,
"CreateIndex": null,
"ModifyIndex": null,
"JobModifyIndex": null
} |
Hi @mjm, so far I still haven't reproduced what you're seeing, however I did notice one interesting thing: in your plan output we see
but when I submit a similar job, get the JSON from inspect, and submit it for planning, I always get
I don't know if that's actually related, but it seems suspicious. Did a job of this name once exist as a periodic job? |
I had 3 different jobs affected by this. One is a periodic batch job, one is a batch job I trigger manually with a dispatch payload when necessary, and the other is a system job (that's the one I included here). They've all been those types of jobs from the beginning as far as I remember. The JSON I got there is coming from some Go code that interacts with the Nomad API to plan and submit jobs, rather than the |
Thanks for pointing that out @luckymike, indeed there does seem to be a problem mixing system jobs with constraints. I'm finally able to reproduce the symptom here, in fact all I needed to do was run my same sample job above but on a cluster with more than one client 😬 |
This PR causes Nomad to no longer memoize the String value of a Constraint. The private memoized variable may or may not be initialized at any given time, which means a reflect.DeepEqual comparison between two jobs (e.g. during Plan) may return incorrect results. Fixes #10836
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Operating system and Environment details
Ubuntu 20.04.2 LTS
Issue
I have some deploy automation for my Nomad cluster that for each job first runs a plan to see if the job has any changes that need to be applied. I've noticed that for some of my jobs, the plan always has type "Edited" even when there are no changes. If I look in the "Versions" tab for the job in the UI, it lists that version as having "0 changes".
Here's an example of the response from the plan endpoint:
This is happening with 3 of my jobs, and the one thing I've noticed they all have in common is that they all have a constraint to only place them on a particular node. My other jobs are not exhibiting this behavior and don't have this constraint on a task group.
I tried digging through the code for planning jobs but I got a bit lost trying to figure out where this kind of decision was made.
Reproduction steps
Register a job with a single task group with a constraint like the one shown above.
Request a plan for the job without any changes.
Expected Result
The plan has diff type "None" because nothing has changed.
Actual Result
The plan has diff type "Edited" due to an in-place update.
Job file (if appropriate)
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
The text was updated successfully, but these errors were encountered: