You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nomad v0.10.0 (25ee121)
(But this should happen in all versions after 0.9)
Operating system and Environment details
Ubuntu 18.04
with Consul 1.6.1 and Vault 1.2.1
Issue
When Nomad client agent is restarted, it renders all the templates once for the existing tasks it manages, but will not restart the task if the rendered file content changed.
I guess the related code is here or here. The events of first rendering seems handled differently from the usual render events, without checking the change_mode to restart the tasks, which makes sense to newly started tasks, but not so right for existing tasks working with Vault together.
What happened to me was:
A few days ago, I restarted all our Nomad agents when upgrading to 0.10.0. Here are the related logs from nomad agent (sorry, has to mask some info since it is our product environment):
....
Nov 01 16:02:29 ip-****** systemd[1]: Stopped Nomad Cluster Manager.
Nov 01 16:02:29 ip-****** systemd[1]: Started Nomad Cluster Manager.
Nov 01 16:02:29 ip-****** nomad[32274]: ==> Loaded configuration from
...
Nov 01 16:02:29 ip-****** nomad[32274]: ==> Starting Nomad agent...
Nov 01 16:02:29 ip-****** nomad[32274]: ==> Nomad agent configuration:
Nov 01 16:02:29 ip-****** nomad[32274]: Advertise Addrs: HTTP: ***.***.***.***:4646
Nov 01 16:02:29 ip-****** nomad[32274]: Bind Addrs: HTTP: 0.0.0.0:4646
Nov 01 16:02:29 ip-****** nomad[32274]: Client: true
Nov 01 16:02:29 ip-****** nomad[32274]: Log Level: INFO
Nov 01 16:02:29 ip-****** nomad[32274]: Region: ******(DC: ******)
Nov 01 16:02:29 ip-****** nomad[32274]: Server: false
Nov 01 16:02:29 ip-****** nomad[32274]: Version: 0.10.0
Nov 01 16:02:29 ip-****** nomad[32274]: ==> Nomad agent started! Log data will stream in below:
.....
Nov 01 16:02:30 ip-****** nomad[32274]: 2019/11/01 16:02:30.149003 [INFO] (runner) rendered "(dynamic)" => "/var/lib/nomad/alloc/ecef2c6b-e09b-9cbb-23e6-15d076939b8c/ap
p/secrets/config.env"
Nov 01 16:02:30 ip-****** nomad[32274]: 2019/11/01 16:02:30.189143 [INFO] (runner) rendered "(dynamic)" => "/var/lib/nomad/alloc/99a9907c-bff3-d2dd-34e7-ef401a3d9aa4/ap
p/secrets/config.env"
Nov 01 16:02:30 ip-****** nomad[32274]: 2019/11/01 16:02:30.209191 [INFO] (runner) rendered "(dynamic)" => "/var/lib/nomad/alloc/cd2731bc-5380-4ca5-3255-3af08125f9ee/ap
p/secrets/config.env"
Nov 01 16:02:39 ip-****** nomad[32274]: 2019-11-01T16:02:39.283+0900 [INFO ] client: node registration complete
Nomad rendered the files for me, but didn't restart the task when some of the rendered files changed. The changed part was database username and password from Vault, probably because the time was close to the expiration time of the old lease.
Since the rendered file is used as environment variables in my task, I could easily confirm this:
jingchen.liu@ip-******:/etc$ sudo docker exec -it 15304b79e832 /bin/bash
bash-4.4$ cd /secrets/
bash-4.4$ ls -l
total 8
-rw-r--r-- 1 root root 373 Nov 1 07:02 config.env
-rw-r--r-- 1 root root 26 Sep 26 10:26 vault_token
bash-4.4$ cat config.env
.....
DB_USER_NAME='v-token-*****-bhluS*********'
DB_PASSWORD='A1a-z21wl**************'
DB_HOST='************.ap-northeast-1.rds.amazonaws.com'
bash-4.4$ env | grep DB
DB_NAME=************
DB_PASSWORD=A1a-nCtwG********
DB_USER_NAME=v-token-********-OghC28**************
DB_HOST=************.ap-northeast-1.rds.amazonaws.com
You can see the values rendered on disk were different from the actual env output. The application was still using the old username/password, while Nomad thought it has already given the latest username/password. And when the old database account expired today, the application didn't get restarted either and failed to connect to database.
I guess the solution is the "handleFirstRender" code should handle differently for the cases of newly started tasks and existing tasks.
Reproduction steps
You can do same thing as described above: create a job with secrets from Vault, when the lease is close to expiration, restart nomad agent and see the file content changed but task not restarted.
Or, there maybe an easier way (I haven't tried it though...): create a job with template reads value from Consul. Do these very quickly: stop Nomad client agent where the job is running, change the value in Consul, and start Nomad client agent. If the the Nomad server didn't notice the short downtime of the client agent, the task should still running with the changed file content on disk.
And please let me know if there is any other info I can provide to help. Thank you!
The text was updated successfully, but these errors were encountered:
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v0.10.0 (25ee121)
(But this should happen in all versions after 0.9)
Operating system and Environment details
Ubuntu 18.04
with Consul 1.6.1 and Vault 1.2.1
Issue
When Nomad client agent is restarted, it renders all the templates once for the existing tasks it manages, but will not restart the task if the rendered file content changed.
I guess the related code is here or here. The events of first rendering seems handled differently from the usual render events, without checking the change_mode to restart the tasks, which makes sense to newly started tasks, but not so right for existing tasks working with Vault together.
What happened to me was:
A few days ago, I restarted all our Nomad agents when upgrading to 0.10.0. Here are the related logs from nomad agent (sorry, has to mask some info since it is our product environment):
Nomad rendered the files for me, but didn't restart the task when some of the rendered files changed. The changed part was database username and password from Vault, probably because the time was close to the expiration time of the old lease.
Since the rendered file is used as environment variables in my task, I could easily confirm this:
You can see the values rendered on disk were different from the actual env output. The application was still using the old username/password, while Nomad thought it has already given the latest username/password. And when the old database account expired today, the application didn't get restarted either and failed to connect to database.
I guess the solution is the "handleFirstRender" code should handle differently for the cases of newly started tasks and existing tasks.
Reproduction steps
You can do same thing as described above: create a job with secrets from Vault, when the lease is close to expiration, restart nomad agent and see the file content changed but task not restarted.
Or, there maybe an easier way (I haven't tried it though...): create a job with template reads value from Consul. Do these very quickly: stop Nomad client agent where the job is running, change the value in Consul, and start Nomad client agent. If the the Nomad server didn't notice the short downtime of the client agent, the task should still running with the changed file content on disk.
And please let me know if there is any other info I can provide to help. Thank you!
The text was updated successfully, but these errors were encountered: