Health awareness #310

tsenart · 2015-10-07T10:03:15Z

Mesos-DNS as a service discovery system should be health-aware. This doesn't mean that it can guarantee healthiness of the returned service instances, only that it does its best to direct clients to capable ones.

With that in mind, we should take into consideration the TaskStatus.healthy field and work with the Marathon and Mesos teams to promote the use of Mesos native health checks.

The text was updated successfully, but these errors were encountered:

imriz · 2018-03-28T16:14:23Z

Is this still true? Mesos DNS will publish unhealthy instances, even if they use Mesos native health checks in Marathon (MESOS_HTTP(S))?

jdef · 2018-03-28T16:33:27Z

I don't think anyone is working on this.

…

On Wed, Mar 28, 2018 at 12:14 PM, Imri Zvik ***@***.***> wrote: Is this still true? Mesos DNS will publish unhealthy instances, even if they use Mesos native health checks in Marathon (MESOS_HTTP(S))? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#310 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACPVLNIVUaP-xRKFWMVtFcrAl75Io33Jks5ti7bhgaJpZM4GKasw> .

imriz · 2018-10-21T14:13:30Z

This is a really needed feature. Currently, mesos-dns will happily announce unhealthy instances, which puts the burden on figuring out the health to the client (which might need few retries to get an healthy instance).

Looking at https://github.com/mesosphere/mesos-dns/blob/master/records/state/state.go#L193 this seems to be quite simple?
The state JSON statuses hash (same place where task state is) will contain the healthy boolean if the task has health check configured and running. If not, it will be omitted.

So it seems a really easy fix would be to omit the record if the healthy field is there, and is set to false.

Any thoughts about it?

jdef · 2018-10-21T14:55:21Z

Related: https://lists.apache.org/thread.html/f79dbb92a0a43c00548ee503a0abbe3e1dd983511747ee77f2fd7966@%3Cdev.mesos.apache.org%3E

imriz · 2018-10-21T15:07:49Z

I would also be glad to distinguish between "grace did not pass yet" to "no health check defined", but for now, the lack of awareness whatsoever is even worse than not distinguishing these two scenarios.
If we treat missing healthy field as "healthy" (and publish such record) we keep backward compatibility by not affecting tasks without health tasks, with the trade off of publishing unhealthy instances during their grace period (which is already happening today anyway).

Bottom line is that this feature is left unanswered for years, and I bet a lot of the users of this project would wish to see it implemented, even if it is not fully covering all scenarios today (maybe add a config flag to enable/disable this).

tsenart added enhancement epic stability labels Oct 7, 2015

tsenart added this to the v1.0.0 milestone Oct 7, 2015

tsenart mentioned this issue Oct 7, 2015

Proposal: Generate records for staging tasks #308

Closed

sargun removed this from the v1.0.0 milestone Nov 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Health awareness #310

Health awareness #310

tsenart commented Oct 7, 2015

imriz commented Mar 28, 2018

jdef commented Mar 28, 2018 via email

imriz commented Oct 21, 2018

jdef commented Oct 21, 2018

imriz commented Oct 21, 2018

Health awareness #310

Health awareness #310

Comments

tsenart commented Oct 7, 2015

imriz commented Mar 28, 2018

jdef commented Mar 28, 2018 via email

imriz commented Oct 21, 2018

jdef commented Oct 21, 2018

imriz commented Oct 21, 2018