-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script health checks failing in v0.10.0 and newer #7185
Comments
I've seen that #6916 fixes some script-related health check issues but my understanding is that the fix was released in 0.10.3 and I've tested 0.10.3 and I'm still seeing the same issue. |
Hi @far-blue! Thanks for opening this. That fix was intended for 0.10.3 but unfortunately we ended up having to do a security release. Sorry for the confusion! If you check the CHANGELOG you'll see that patch is slated for 0.10.4, which is currently available as a release candidate 0.10.4-rc1. Give that a try and if it's not fixed, let me know and I'll definitely take a second look at it. |
Hi @tgross :) Thanks for the quick reply. Yes, now you point it out in the changelog I can see it missed the 0.10.3 release. I've not been able to do a thorough testing with 0.10.4-rc1 as I've rolled my cluster back to 0.9.7 but I did a quick check on a single node by isolating it and I can confirm that, yes, the immediate symptoms of id mismatch errors in the logs and failing health checks are fixed in the release candidate. As I'm not keen to run an RC in production, do you know if the 0.10.4 release is likely in February or further into the future? |
Looks like we just released Nomad 0.10.4, so you should be all set! |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Overview of the Issue
When I update from v0.9.x to v0.10.x I'm finding script based health checks are failing.
The last version of Nomad that works for me is 0.9.7 and the first failing version is 0.10.0
I'm seeing the same problem regardless of the version of Consul in use. Most of my testing was with the latest version of Consul (1.7.0)
Reproduction Steps
I defined a Nomad job with a single group and a docker task. That task has a service section with an associated check. The service definition looks like this:
What I'm seeing is that the script check is running (I can see it in the logs for the container) but Consul is reporting the check as failing.
In the Nomad Client logs I'm seeing lines like this when the job is run:
Notice the check ID.
Similarly I can see the following every time the check is run in the consul logs:
Actually asking the local consul agent about the health checks shows that it has a very different idea of the relevant check:
Notice the ID is different.
I do have
"enable_local_script_checks": true,
in the consul config.Operating system and Environment details
Ubuntu 18.04
The text was updated successfully, but these errors were encountered: