Skip to content
This repository has been archived by the owner on Oct 23, 2024. It is now read-only.

Health awareness #310

Open
tsenart opened this issue Oct 7, 2015 · 5 comments
Open

Health awareness #310

tsenart opened this issue Oct 7, 2015 · 5 comments

Comments

@tsenart
Copy link
Contributor

tsenart commented Oct 7, 2015

Mesos-DNS as a service discovery system should be health-aware. This doesn't mean that it can guarantee healthiness of the returned service instances, only that it does its best to direct clients to capable ones.

With that in mind, we should take into consideration the TaskStatus.healthy field and work with the Marathon and Mesos teams to promote the use of Mesos native health checks.

@imriz
Copy link

imriz commented Mar 28, 2018

Is this still true? Mesos DNS will publish unhealthy instances, even if they use Mesos native health checks in Marathon (MESOS_HTTP(S))?

@jdef
Copy link
Contributor

jdef commented Mar 28, 2018 via email

@imriz
Copy link

imriz commented Oct 21, 2018

This is a really needed feature. Currently, mesos-dns will happily announce unhealthy instances, which puts the burden on figuring out the health to the client (which might need few retries to get an healthy instance).

Looking at https://github.com/mesosphere/mesos-dns/blob/master/records/state/state.go#L193 this seems to be quite simple?
The state JSON statuses hash (same place where task state is) will contain the healthy boolean if the task has health check configured and running. If not, it will be omitted.

So it seems a really easy fix would be to omit the record if the healthy field is there, and is set to false.

Any thoughts about it?

@imriz
Copy link

imriz commented Oct 21, 2018

I would also be glad to distinguish between "grace did not pass yet" to "no health check defined", but for now, the lack of awareness whatsoever is even worse than not distinguishing these two scenarios.
If we treat missing healthy field as "healthy" (and publish such record) we keep backward compatibility by not affecting tasks without health tasks, with the trade off of publishing unhealthy instances during their grace period (which is already happening today anyway).

Bottom line is that this feature is left unanswered for years, and I bet a lot of the users of this project would wish to see it implemented, even if it is not fully covering all scenarios today (maybe add a config flag to enable/disable this).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants