-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
template stanza doesn't load env as expected #6112
Comments
It is kind of similar to this: |
@danaps could you please post the job definition? |
|
@dadgar Can you assist please? :) |
@elifish4 , I'm looking at this right now. |
@danaps , we've found a race condition around this use case. The issue is that when Nomad registers the upstream service ( This is a bug, which will be addressed. I appreciate your help in tracking this one down. In the short term, increasing the I will leave this issue open and update it later before this fix is shipped. |
thanks @cgbaker |
@cgbaker, Thank you |
The workaround we went for to solve this is to add a check in the entrypoint of the container. If the envvar is not there, the container will exit. This way the container will be restarted until the template is loaded correctly in the environment.
|
When are you going to merge the fix? @schmichael @cgbaker |
@schmichael @cgbaker, any news about this issue? |
The bug still occurs on Nomad ver 0.11.4 |
how is this not a critical bug? Having a template re-rendered but env not updated is crucial for anyone running 12-factor like apps |
@burdandrei we have investigated this one at length and it's decidedly non-trivial, and we have to trade off resources between any given bug, support for enterprise customers, and other feature development. The workaround that @jtrivino95 provided above will handle this case, and in general it's a good practice for applications to assert their dependencies. As you might imagine, bugs with a reasonable workaround tend to fall behind those without. Just to dump the results of some of our investigations here... the bug is reproducible, albeit not 100% of the time, with the following set of jobs: postgres.nomadjob "database" {
datacenters = ["dc1"]
group "database" {
restart {
attempts = 100
delay = "3s"
}
task "postgres" {
driver = "docker"
config {
image = "postgres"
port_map {
db = 5432
}
}
resources {
cpu = 500
memory = 256
network {
mbits = 10
port "db" {}
}
}
service {
name = "postgres"
tags = ["global", "postgres"]
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
} nginx.nomadjob "web" {
datacenters = ["dc1"]
group "web" {
task "nginx" {
driver = "docker"
config {
image = "nginx:alpine"
port_map {
http = 80
}
}
resources {
cpu = 500
memory = 256
network {
mbits = 10
port "http" {}
}
}
template {
change_mode = "restart"
destination = "local/services.env"
splay = "0s"
env = true,
data = <<EOH
{{ range service "redis" }}
UPSTREAM_REDIS_HOST="{{ .Address }}"
UPSTREAM_REDIS_PORT="{{ .Port }}"
{{ end }}
{{ range service "postgres" }}
UPSTREAM_PSQL_HOST="{{ .Address }}"
UPSTREAM_PSQL_PORT="{{ .Port }}"
{{ end }}
EOH
}
}
}
}
redis.nomadjob "cache" {
datacenters = ["dc1"]
group "cache" {
restart {
attempts = 100
delay = "3s"
}
task "redis" {
driver = "docker"
config {
image = "redis:3.2"
port_map {
db = 6379
}
}
resources {
cpu = 500
memory = 256
network {
mbits = 10
port "db" {}
}
}
service {
name = "redis"
tags = ["global", "cache"]
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
}
At that point, the
The diagram below illustrates the window of the race: As you can see, it's not just a matter of atomically registering the Consul service and the health check at the same time (which would be #3935). And it's not quite as easy as merging some data structures as we did with #3498. But if this is an area that interests you, we'd love to hear your ideas! Thanks! |
Thanks for so comprehensive answer @tgross! P.S. timeline graph is awesome! |
Any updates on this bug? is it fixed in the newer versions? we facing this issue in nomad version 1.8.0 |
Hi @jpatidar30 there are no updates currently. When an engineer is assigned to this and working on it, updates will be provided in the issue. |
Nomad version
Nomad v0.9.4 (a81aa846a45fb8248551b12616287cb57c418cd6)
(encountered the same issue on 0.9.3)
Operating system and Environment details
Ubuntu 16.04.6 LTS
4.4.0-1085-aws #96-Ubuntu SMP Tue Jun 11 09:08:32 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Issue
when using range service in nomad template stanza and setting env = true it write the output to the destination file but it doesn't always load the env var as expected. the issue happened sporadically.
Reproduction steps
I have a task that watch a service (redis) in consul,
I killed the docker (docker kill) that run the redis service. the task with redis raised again, then the task that watched the redis restarted as well, (used the default change_mode restart) the file was re-renderd but the env var did not load. it happens sporadically and I must add that I had to kill the redis like 3 or 4 times until the issue was reproduced.
Nomad Client logs (if appropriate)
the task that watch redis:
the task that run the redis service:
again the task that watch redis, you can see that there are now missing envs:
nomad logs:
The text was updated successfully, but these errors were encountered: