-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
environment template is not always working #3498
Comments
Thanks for reporting this issue. Do you have more information about when the environment template is and isn't ignored? If you have logs that are relevant to this issue, feel free to add them here. |
@mildred Could you please share DEBUG level logs of the client where this occurs. |
I have the following nomad client logs. This occurs even though the environment section of the job file is empty and all variables are set using the environment template instead. nomad logs2017/11/06 18:31:04.988262 [INFO] client: task "deploy" for alloc "bc4e33c0-aed2-bd19-1798-e06216f5876a" failed: Wait returned exit code 1, signal 0, and error Docker container exited with non-zero exit code: 1 2017/11/06 18:31:04.988317 [INFO] client: Restarting task "deploy" for alloc "bc4e33c0-aed2-bd19-1798-e06216f5876a" in 16.362625637s 2017/11/06 18:31:04.990046 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks 2017/11/06 18:31:05.138308 [DEBUG] client: updated allocations at index 4602 (total 14) (pulled 0) (filtered 14) 2017/11/06 18:31:05.138530 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 14) 2017/11/06 18:31:07.154924 [DEBUG] http: Request /v1/agent/servers (45.587µs) 2017/11/06 18:31:07.156368 [DEBUG] http: Request /v1/status/peers (170.845µs) 2017/11/06 18:31:08.768912 [ERR] http: Request /v1/node/172857b9-3b5f-bd0a-b09d-545ac3ceb6dc?index=6&wait=1m, error: node not found 2017/11/06 18:31:08.769380 [DEBUG] http: Request /v1/node/172857b9-3b5f-bd0a-b09d-545ac3ceb6dc?index=6&wait=1m (1m2.208570212s) 2017/11/06 18:31:09.107824 [INFO] client: task "register" for alloc "6decdf71-8e82-438b-4365-c85ec05ab800" completed successfully 2017/11/06 18:31:09.107876 [INFO] client: Restarting task "register" for alloc "6decdf71-8e82-438b-4365-c85ec05ab800" in 5.69028103s 2017/11/06 18:31:09.107836 [INFO] client: task "forward" for alloc "6decdf71-8e82-438b-4365-c85ec05ab800" failed: Wait returned exit code 255, signal 0, and error 2017/11/06 18:31:09.107968 [INFO] client: Restarting task "forward" for alloc "6decdf71-8e82-438b-4365-c85ec05ab800" in 5.69028103s 2017/11/06 18:31:09.110549 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks 2017/11/06 18:31:09.112986 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks 2017/11/06 18:31:09.337888 [DEBUG] client: updated allocations at index 4603 (total 14) (pulled 0) (filtered 14) 2017/11/06 18:31:09.337955 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 14) 2017/11/06 18:31:11.742120 [INFO] client: task "forward" for alloc "597a9be9-af7e-b8ea-eb14-41b66d98ec92" failed: Wait returned exit code 255, signal 0, and error 2017/11/06 18:31:11.742181 [INFO] client: Restarting task "forward" for alloc "597a9be9-af7e-b8ea-eb14-41b66d98ec92" in 5.346224508s 2017/11/06 18:31:11.742517 [INFO] client: task "register" for alloc "597a9be9-af7e-b8ea-eb14-41b66d98ec92" completed successfully 2017/11/06 18:31:11.742567 [INFO] client: Restarting task "register" for alloc "597a9be9-af7e-b8ea-eb14-41b66d98ec92" in 5.346224508s 2017/11/06 18:31:11.745454 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks 2017/11/06 18:31:11.747612 [DEBUG] consul.sync: registered 0 services, 0 checks; deregistered 0 services, 0 checks 2017/11/06 18:31:11.938454 [DEBUG] client: updated allocations at index 4604 (total 14) (pulled 0) (filtered 14) 2017/11/06 18:31:11.938588 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 14) 2017/11/06 18:31:12 [DEBUG] memberlist: TCP connection from=127.0.0.1:36034 2017/11/06 18:31:14.819411 [DEBUG] driver.raw_exec: started process with pid: 7231 2017/11/06 18:31:14.830486 [DEBUG] driver.raw_exec: started process with pid: 7275 2017/11/06 18:31:14.937910 [DEBUG] client: updated allocations at index 4606 (total 14) (pulled 0) (filtered 14) 2017/11/06 18:31:14.937994 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 14) 2017/11/06 18:31:15.137896 [DEBUG] client: updated allocations at index 4607 (total 14) (pulled 0) (filtered 14) 2017/11/06 18:31:15.137968 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 14) 2017/11/06 18:31:15.551060 [DEBUG] http: Request /v1/job/svc-qa-noenv-20171002180522-squarescale-machine-sqsc-demo/allocations?index=4588&wait=1m (1m0.939825811s) 2017/11/06 18:31:16.552631 [DEBUG] http: Request /v1/job/svc-qa-noenv-20171002180522-squarescale-machine-sqsc-demo/allocations?index=4588&wait=1m (673.628µs) 2017/11/06 18:31:16.943638 [DEBUG] http: Request /v1/job/svc-qa-noenv-20171002172154-squarescale-machine-hello/allocations?index=4588&wait=1m (1m1.648431515s) 2017/11/06 18:31:17.114875 [DEBUG] driver.raw_exec: started process with pid: 7451 2017/11/06 18:31:17.120173 [DEBUG] driver.raw_exec: started process with pid: 7455 2017/11/06 18:31:17.156304 [DEBUG] http: Request /v1/agent/servers (34.34µs) 2017/11/06 18:31:17.157408 [DEBUG] http: Request /v1/status/peers (121.718µs) 2017/11/06 18:31:17.338751 [DEBUG] client: updated allocations at index 4608 (total 14) (pulled 0) (filtered 14) 2017/11/06 18:31:17.338891 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 14) 2017/11/06 18:31:17.554116 [DEBUG] http: Request /v1/job/svc-qa-noenv-20171002180522-squarescale-machine-sqsc-demo/allocations?index=4607&wait=1m (566.876µs) 2017/11/06 18:31:17.945171 [DEBUG] http: Request /v1/job/svc-qa-noenv-20171002172154-squarescale-machine-hello/allocations?index=4588&wait=1m (601.115µs) 2017/11/06 18:31:21.382134 [DEBUG] driver.docker: Setting default logging options to syslog and unix:///tmp/plugin253912072 2017/11/06 18:31:21.382179 [DEBUG] driver.docker: Using config for logging: {Type:syslog ConfigRaw:[] Config:map[syslog-address:unix:///tmp/plugin253912072]} 2017/11/06 18:31:21.382192 [DEBUG] driver.docker: using 536870912 bytes memory for deploy 2017/11/06 18:31:21.382200 [DEBUG] driver.docker: using 100 cpu shares for deploy 2017/11/06 18:31:21.382237 [DEBUG] driver.docker: binding directories []string{"/home/mildred/Work/SquareScale/squarescale-infrastructure/dev/nomad-data/alloc/bc4e33c0-aed2-bd19-1798-e06216f5876a/alloc:/alloc", "/home/mildred/Work/SquareScale/squarescale-infrastructure/dev/nomad-data/alloc/bc4e33c0-aed2-bd19-1798-e06216f5876a/deploy/local:/local", "/home/mildred/Work/SquareScale/squarescale-infrastructure/dev/nomad-data/alloc/bc4e33c0-aed2-bd19-1798-e06216f5876a/deploy/secrets:/secrets", "/home/mildred/Work/SquareScale/squarescale-infrastructure/dev/squarescale-deploy/terraform:/terraform"} for deploy 2017/11/06 18:31:21.382261 [DEBUG] driver.docker: networking mode not specified; defaulting to bridge 2017/11/06 18:31:21.382288 [DEBUG] driver.docker: setting container name to: deploy-bc4e33c0-aed2-bd19-1798-e06216f5876a 2017/11/06 18:31:21.978340 [INFO] driver.docker: created container 8ce48f04f15f9c4c8b9015d6f0fc757f05ac1e21229d1422f0e54e6882004c6a 2017/11/06 18:31:22.502623 [INFO] driver.docker: started container 8ce48f04f15f9c4c8b9015d6f0fc757f05ac1e21229d1422f0e54e6882004c6a 2017/11/06 18:31:22.543870 [WARN] client: error fetching stats of task deploy: stats collection hasn't started yet 2017/11/06 18:31:22 [DEBUG] memberlist: TCP connection from=127.0.0.1:36054 2017/11/06 18:31:22.737883 [DEBUG] client: updated allocations at index 4609 (total 14) (pulled 0) (filtered 14) 2017/11/06 18:31:22.737986 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 14) 2017/11/06 18:31:23.090949 [INFO] client: task "deploy" for alloc "bc4e33c0-aed2-bd19-1798-e06216f5876a" failed: Wait returned exit code 1, signal 0, and error Docker container exited with non-zero exit code: 1 The task named nomad inspect deploy shows: json job (nomad inspect){ "Job": { "AllAtOnce": false, "Constraints": [ { "LTarget": "${meta.sqsc.environment}", "Operand": "=", "RTarget": "dev" }, { "LTarget": "${meta.sqsc.project}", "Operand": "=", "RTarget": "sqsc" } ], "CreateIndex": 42, "Datacenters": [ "dev-sqsc" ], "ID": "deploy", "JobModifyIndex": 42, "Meta": null, "ModifyIndex": 92, "Name": "deploy", "ParameterizedJob": null, "ParentID": "", "Payload": null, "Periodic": null, "Priority": 50, "Region": "global", "Stable": true, "Status": "running", "StatusDescription": "", "Stop": false, "SubmitTime": 1509728475312274239, "TaskGroups": [ { "Constraints": null, "Count": 1, "EphemeralDisk": { "Migrate": false, "SizeMB": 300, "Sticky": false }, "Meta": null, "Name": "deploy", "RestartPolicy": { "Attempts": 2, "Delay": 15000000000, "Interval": 60000000000, "Mode": "delay" }, "Tasks": [ { "Artifacts": null, "Config": { "force_pull": false, "image": "095348363195.dkr.ecr.eu-west-1.amazonaws.com/squarescale-deploy:dev-latest", "volumes": [ "/home/mildred/Work/SquareScale/squarescale-infrastructure/dev/squarescale-deploy/terraform:/terraform" ], "dns_servers": [ "172.17.0.1" ] }, "Constraints": null, "DispatchPayload": null, "Driver": "docker", "Env": null, "KillTimeout": 5000000000, "Leader": false, "LogConfig": { "MaxFileSizeMB": 10, "MaxFiles": 10 }, "Meta": null, "Name": "deploy", "Resources": { "CPU": 100, "DiskMB": 0, "IOPS": 0, "MemoryMB": 512, "Networks": [ { "CIDR": "", "Device": "", "DynamicPorts": null, "IP": "", "MBits": 10, "ReservedPorts": null } ] }, "Services": null, "ShutdownDelay": 0, "Templates": [ { "ChangeMode": "restart", "ChangeSignal": "", "DestPath": "environment", "EmbeddedTmpl": "\n\nSQSC_ENVIRONMENT=\"staging\"\nSQSC_TFSTATE_BUCKET=\"sqsc-tfstate-staging\"\n\n\nNOMAD_VERSION=\"{{ key \"squarescale/dev/nomad_version\" }}\"\nAWS_REGION=\"{{ key \"squarescale/dev/aws/region\" }}\"\nAWS_ACCESS_KEY_ID=\"{{ key \"squarescale/dev/aws/access_key_id\" }}\"\nAWS_SECRET_ACCESS_KEY=\"{{ key \"squarescale/dev/aws/secret_access_key\" }}\"\nSQSC_TERRAFORM_SQS_INPUT_URL=\"{{ key \"squarescale/dev/deploy_sqs\" }}\"\n", "Envvars": true, "LeftDelim": "{{", "Perms": "0644", "RightDelim": "}}", "SourcePath": "", "Splay": 5000000000, "VaultGrace": 300000000000 }, { "ChangeMode": "restart", "ChangeSignal": "", "DestPath": "restart", "EmbeddedTmpl": " {{keyOrDefault (printf \"restart/%s\" (env \"NOMAD_JOB_NAME\")) \"\"}}\n {{keyOrDefault (printf \"restart/%s/%s\" (env \"NOMAD_JOB_NAME\") (env \"NOMAD_TASK_NAME\")) \"\"}}\n {{keyOrDefault (printf \"restart/%s/%s/%s\" (env \"NOMAD_JOB_NAME\") (env \"NOMAD_TASK_NAME\") (env \"NOMAD_ALLOC_INDEX\")) \"\"}}\n {{keyOrDefault (printf \"restart/%s\" (env \"NOMAD_ALLOC_NAME\")) \"\"}}\n", "Envvars": false, "LeftDelim": "{{", "Perms": "0644", "RightDelim": "}}", "SourcePath": "", "Splay": 5000000000, "VaultGrace": 300000000000 } ], "User": "", "Vault": null } ], "Update": { "AutoRevert": false, "Canary": 0, "HealthCheck": "checks", "HealthyDeadline": 300000000000, "MaxParallel": 1, "MinHealthyTime": 10000000000, "Stagger": 10000000000 } } ], "Type": "service", "Update": { "AutoRevert": false, "Canary": 0, "HealthCheck": "", "HealthyDeadline": 0, "MaxParallel": 1, "MinHealthyTime": 0, "Stagger": 10000000000 }, "VaultToken": "", "Version": 0 } } consul keys are available:
environment template file is being written to disk correctly:
deploy/executor.out shows only repeated lines like this one:
|
I just ran in the same problem on 0.7. |
@dadgar I got the reproduce steps: Here is the job definition:
Now when you start the job you receive both test1 and test2 environment variables. |
if I understand this right this happening because of this: https://github.com/hashicorp/nomad/blob/master/client/consul_template.go#L397 Question is what is the right solution, not to give more than one template that is injecting environment, or build the logic of merging them? |
See the JSON Nomad job in #3498 (comment) it has two templates but only one of them is an environment template. Still it doesn't alays work. About the two environment templates issue, I don't see a problem to have nomad restrict to a single environment template. But if it is not difficult, implementing a merge mechanism might be a better solution (no edge case to check when accepting jobs) |
this is even more fun, and awful! |
@mildred I was not able to ever reproduce with either of the job files you gave. I was able to reproduce with the description @burdandrei gave. I have attached a Linux AMD64 build that has the fix. I would love if you could test and tell me if I have fixed your issue too. |
Thanks for super fast fix @dadgar ! |
I noticed the problem occurred mostly when I changed a consul key in I'll check this version and report back if I still have issues. |
@dadgar it seems to work well. The cases where this bug happened don't seem to trigger the bug any more now. Thank you. |
@burdandrei @mildred Thank you for testing and for the super fast feedback! |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.6.2
Operating system and Environment details
CoreOS
Issue
When a nomad job with a task that uses the docker driver has both an environment section
env { "KEY" = "value" }
and an environment template, the environment section is used to populate the docker container environment but the environment template is ignored. but sometimes it works too.Reproduction steps
Run the nomad job, sometimes you don't get the environment variables from template
Job file (if appropriate)
The job file is similar to this:
(You need to the the consul key
sample-consul-key
and have a nomad node that matches the constraints set)The text was updated successfully, but these errors were encountered: