Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service Registration and Health Checks flapping when task name starts with another task name #2294

Closed
ashald opened this issue Feb 8, 2017 · 6 comments

Comments

@ashald
Copy link

ashald commented Feb 8, 2017

If filing a bug please include the following:

Nomad version

Nomad v0.5.0

Operating system and Environment details

CentOS 7

Issue

When job created as described in Reproduction steps, the task "that starts with other task's name" if "flaky" in Consul. it's service definition being "registered"/"deregistered" every few seconds and health-checks failing to execute (we tried script and HTTP checks).

Reproduction steps

Create a job with a group and 2 tasks called foo and foobar.

Nomad Client logs (if appropriate)

Executor log

017/02/07 17:49:46 [DEBUG] consul.syncer: check "a48e3df662d3e656d9fd5de8d7202c27ff7ee150" failed, disabling Consul checks until until next successful sync: Unexpected response code: 500 (CheckID "a48e3df662d3e656d9fd5de8d7202c27ff7ee150" does not have associated TTL)
schmichael added a commit that referenced this issue Feb 9, 2017
Also switched from a build tag to an env var for enabling these tests so
at least we know if they stop building properly without enabling them.
@schmichael schmichael self-assigned this Feb 9, 2017
@schmichael
Copy link
Member

Pushed a branch with new test reproducing the bug you reported. I'm working on a significant refactor of our consul interaction code that will fix this, but it looks like that refactoring won't be done in time for 0.5.5 I'll try to get a patch in.

@ashald
Copy link
Author

ashald commented Feb 9, 2017

Thanks for quick reply! We've used a quick and obvious workaround of changing the task name for now (so we're not in a hurry here). :) But would be nice to have that fixed so looking forward to your refactoring. Would you mind mention/ping me once you going to merge those changes? Thanks

@schmichael
Copy link
Member

@ashald I'll definitely close this ticket when the fix lands whether it's in a patch or as part of the larger refactoring!

@ashald
Copy link
Author

ashald commented Feb 9, 2017

Thanks!

schmichael added a commit that referenced this issue Mar 30, 2017
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
schmichael added a commit that referenced this issue Apr 19, 2017
Fixes #2478 #2474 #1995 #2294

The new client only handles agent and task service advertisement. Server
discovery is mostly unchanged.

The Nomad client agent now handles all Consul operations instead of the
executor handling task related operations. When upgrading from an
earlier version of Nomad existing executors will be told to deregister
from Consul so that the Nomad agent can re-register the task's services
and checks.

Drivers - other than qemu - now support an Exec method for executing
abritrary commands in a task's environment. This is used to implement
script checks.

Interfaces are used extensively to avoid interacting with Consul in
tests that don't assert any Consul related behavior.
@schmichael
Copy link
Member

Fixed on master by #2467

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants