Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance /debug/health to be usable as a liveness probe #3139

Merged
merged 4 commits into from
Sep 6, 2017

Conversation

bbeaudreault
Copy link
Contributor

Only report health for serving types, namely master, replica, batch. Fix api to return 500 if in error

This should be useful for those in kubernetes or who otherwise want to use a probe to determine whether the vttablet process is healthy. This is distinct from /healthz, which is used for load balancers and returns unhealthy in more cases.

Open question: should we also whitelist the StateTransitioning state? Maybe not, as this should probably be called with a heuristic that allows for a certain number of failures, like many probe implementations do.

@sougou @adkhare

@bbeaudreault bbeaudreault changed the title Enhance /debug/health to be usable as a readiness probe Enhance /debug/health to be usable as a liveness probe Sep 6, 2017
@bbeaudreault
Copy link
Contributor Author

I've tested this out and it works as expected in our environment. As I anticipated in my original description, there is a period of 5s or so where the state is transitioning and it returns an error. We could whitelist that state, but I'm just tuning my probes to be able to ride over it.

tabletType := tsv.target.TabletType
tsv.mu.Unlock()
switch tabletType {
case topodatapb.TabletType_MASTER, topodatapb.TabletType_REPLICA, topodatapb.TabletType_BATCH:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be future-proof, let's add experimental, which is considered a serving type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. And rdonly also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BATCH and RDONLY are the same value -- the switch complains if i add that. I'll do experimintal tho

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course :). Just experimental then.

@bbeaudreault
Copy link
Contributor Author

added experimental. i think shard 1 is never going to go green, it's been failing all morning for multiple PRs

@sougou
Copy link
Contributor

sougou commented Sep 6, 2017

LGTM

Approved with PullApprove

@sougou sougou merged commit b75a83e into vitessio:master Sep 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants