You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When canary deployments are pending manual promotion, submitting another job can result in 3 simultaneous versions of the job running until the most recent deployment is promoted. What the correct behavior should be here is a little complex to determine, but in any case we should make sure we understand it at least.
This has been reported on versions of Nomad as early as 0.10.5 and 0.12.0, and I've just verified it with the current development head (1.0.4-dev). Possibly related to #6939#8439
To reproduce on a current version of Nomad, consider the following job spec with canaries deployments and manual promotion:
After running the job and waiting for its deployment to be marked successful, make the following modification to the jobspec:
meta {
- key = "value0"+ key = "value1"
}
Run the job, wait for the canary to become healthy:
$ nomad job status ex
...
Summary
Task Group Queued Starting Running Failed Complete Lost
web 0 0 11 0 0 0
Latest Deployment
ID = 17fc028b
Status = running
Description = Deployment is running but requires manual promotion
Deployed
Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline
web true false 10 1 1 1 0 2021-02-26T19:45:49Z
Allocations
ID Node ID Task Group Version Desired Status Created Modified
083e2f83 2bca72d3 web 1 run running 21s ago 10s ago
0cc54602 2bca72d3 web 0 run running 1m7s ago 55s ago
3c6a746a 2bca72d3 web 0 run running 1m7s ago 55s ago
532759a9 2bca72d3 web 0 run running 1m7s ago 55s ago
5b1bf2c2 2bca72d3 web 0 run running 1m7s ago 55s ago
641bf9c1 2bca72d3 web 0 run running 1m7s ago 55s ago
64585431 2bca72d3 web 0 run running 1m7s ago 55s ago
aaee1e47 2bca72d3 web 0 run running 1m7s ago 55s ago
b69655d0 2bca72d3 web 0 run running 1m7s ago 55s ago
cb41f7ab 2bca72d3 web 0 run running 1m7s ago 55s ago
df518015 2bca72d3 web 0 run running 1m7s ago 55s ago
Then modify the jobspec again (but don't run it yet!):
meta {
- key = "value1"+ key = "value2"
}
Promote the deployment, and immediately run the job with the new jobspec:
$ nomad deployment promote 17f; nomad job run ./example.nomad
==> Monitoring evaluation "2b71a0e3"
Evaluation triggered by job "example"
Evaluation within deployment: "17fc028b"
Allocation "08059832" created: node "2bca72d3", group "web"
Allocation "bf0ebf00" created: node "2bca72d3", group "web"
Allocation "db194b08" created: node "2bca72d3", group "web"
==> Monitoring evaluation "2b71a0e3"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "2b71a0e3" finished with status "complete"
==> Monitoring evaluation "75e263c0"
Evaluation triggered by job "example"
Allocation "01d8aad2" created: node "2bca72d3", group "web"
==> Monitoring evaluation "75e263c0"
Evaluation within deployment: "582a89dd"
Allocation "01d8aad2" status changed: "pending" -> "running" (Tasks are running)
Evaluation status changed: "pending" -> "complete"
==> Evaluation "75e263c0" finished with status "complete"
The job status at this point is a mix of version 0, version 1, and version 2 allocations:
$ nomad job status ex
...
Summary
Task Group Queued Starting Running Failed Complete Lost
web 0 0 11 0 4 0
Latest Deployment
ID = 582a89dd
Status = running
Description = Deployment is running but requires manual promotion
Deployed
Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline
web true false 10 1 1 1 0 2021-02-26T19:46:52Z
Allocations
ID Node ID Task Group Version Desired Status Created Modified
01d8aad2 2bca72d3 web 2 run running 11s ago 0s ago
08059832 2bca72d3 web 1 run running 12s ago 5s ago
db194b08 2bca72d3 web 1 run running 12s ago 5s ago
bf0ebf00 2bca72d3 web 1 run running 12s ago 5s ago
083e2f83 2bca72d3 web 1 run running 1m14s ago 1m3s ago
641bf9c1 2bca72d3 web 0 stop complete 2m ago 6s ago
5b1bf2c2 2bca72d3 web 0 run running 2m ago 1m48s ago
532759a9 2bca72d3 web 0 run running 2m ago 1m48s ago
64585431 2bca72d3 web 0 stop complete 2m ago 6s ago
aaee1e47 2bca72d3 web 0 run running 2m ago 1m48s ago
b69655d0 2bca72d3 web 0 stop complete 2m ago 6s ago
3c6a746a 2bca72d3 web 0 run running 2m ago 1m48s ago
cb41f7ab 2bca72d3 web 0 run running 2m ago 1m48s ago
0cc54602 2bca72d3 web 0 run running 2m ago 1m48s ago
df518015 2bca72d3 web 0 stop complete 2m ago 6s ago
This will gradually stabilize to version 1 and version 2, but fortunately if at any point the latest deployment is promoted, the job appears to converge to version 2 with a successful deployment as expected:
ID = example
Name = example
Submit Date = 2021-02-26T19:45:42Z
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
web 0 0 10 0 14 0
Latest Deployment
ID = 582a89dd
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Auto Revert Promoted Desired Canaries Placed Healthy Unhealthy Progress Deadline
web true true 10 1 10 10 0 2021-02-26T19:48:2
9Z
Allocations
ID Node ID Task Group Version Desired Status Created Modified
3c424f6a 2bca72d3 web 2 run running 19s ago 3s ago
896a1b5c 2bca72d3 web 2 run running 19s ago 3s ago
1b727979 2bca72d3 web 2 run running 19s ago 3s ago
393b4e30 2bca72d3 web 2 run running 37s ago 21s ago
6ecd4eff 2bca72d3 web 2 run running 37s ago 21s ago
3371412b 2bca72d3 web 2 run running 37s ago 21s ago
ff7a26c5 2bca72d3 web 2 run running 55s ago 38s ago
ed32a383 2bca72d3 web 2 run running 55s ago 38s ago
a10d49a6 2bca72d3 web 2 run running 55s ago 38s ago
01d8aad2 2bca72d3 web 2 run running 1m51s ago 1m40s ago
08059832 2bca72d3 web 1 stop complete 1m52s ago 49s ago
db194b08 2bca72d3 web 1 stop complete 1m52s ago 49s ago
bf0ebf00 2bca72d3 web 1 stop complete 1m52s ago 49s ago
083e2f83 2bca72d3 web 1 stop complete 2m54s ago 49s ago
5b1bf2c2 2bca72d3 web 0 stop complete 3m40s ago 14s ago
0cc54602 2bca72d3 web 0 stop complete 3m40s ago 31s ago
aaee1e47 2bca72d3 web 0 stop complete 3m40s ago 14s ago
b69655d0 2bca72d3 web 0 stop complete 3m40s ago 1m46s ago
3c6a746a 2bca72d3 web 0 stop complete 3m40s ago 31s ago
cb41f7ab 2bca72d3 web 0 stop complete 3m40s ago 31s ago
532759a9 2bca72d3 web 0 stop complete 3m40s ago 13s ago
df518015 2bca72d3 web 0 stop complete 3m40s ago 1m46s ago
641bf9c1 2bca72d3 web 0 stop complete 3m40s ago 1m46s ago
64585431 2bca72d3 web 0 stop complete 3m40s ago 1m46s ago
The text was updated successfully, but these errors were encountered:
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
When canary deployments are pending manual promotion, submitting another job can result in 3 simultaneous versions of the job running until the most recent deployment is promoted. What the correct behavior should be here is a little complex to determine, but in any case we should make sure we understand it at least.
This has been reported on versions of Nomad as early as 0.10.5 and 0.12.0, and I've just verified it with the current development head (1.0.4-dev). Possibly related to #6939 #8439
To reproduce on a current version of Nomad, consider the following job spec with canaries deployments and manual promotion:
jobspec
After running the job and waiting for its deployment to be marked successful, make the following modification to the jobspec:
Run the job, wait for the canary to become healthy:
Then modify the jobspec again (but don't run it yet!):
Promote the deployment, and immediately run the job with the new jobspec:
The job status at this point is a mix of version 0, version 1, and version 2 allocations:
This will gradually stabilize to version 1 and version 2, but fortunately if at any point the latest deployment is promoted, the job appears to converge to version 2 with a successful deployment as expected:
The text was updated successfully, but these errors were encountered: