-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
healthcheck: update healthy tablets correctly when a stream returns an error or times out #7654
Conversation
…n error or times out Signed-off-by: deepthi <[email protected]>
…r timeout from healthcheck stream Signed-off-by: deepthi <[email protected]>
go/vt/discovery/healthcheck.go
Outdated
@@ -404,17 +404,17 @@ func (hc *HealthCheckImpl) deleteTablet(tablet *topodata.Tablet) { | |||
} | |||
} | |||
|
|||
func (hc *HealthCheckImpl) updateHealth(th *TabletHealth, shr *query.StreamHealthResponse, currentTarget *query.Target, trivialNonMasterUpdate bool, isMasterUpdate bool, isMasterChange bool) { | |||
func (hc *HealthCheckImpl) updateHealth(th *TabletHealth, currentTarget *query.Target, trivialUpdate bool, isPrimaryUp bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to isPrimaryUpdate
? Otherwise, I was reading it as "is primary up".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is supposed to be "is primary up". The old parameters (isMasterChange and isMasterUpdate) were actually unnecessary because all the information they pass in is available within the scope of this func. However, we do need to know whether the primary should be marked unhealthy (if there is an error/timeout on the healthcheck connection), so I introduced this parameter.
go/vt/discovery/healthcheck.go
Outdated
topoproto.TabletAliasString(hc.healthy[targetKey][0].Tablet.Alias), | ||
shr.TabletExternallyReparentedTimestamp, | ||
hc.healthy[targetKey][0].MasterTermStartTime) | ||
if th.Target.TabletType == topodata.TabletType_MASTER { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would presume that isPrimaryUp
would be true only if the tablet type was MASTER. Is there a case where this is not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. I suppose this is more of a consistency check - to make sure that we still behave correctly if a caller passes in isPrimaryUp
true for a non-master tablet type.
go/vt/discovery/healthcheck.go
Outdated
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this section was the reason you added the MASTER check, it may read better if you explicitly checked for it here. Something like th.Target.TabletType == topodata.TabletType_MASTER && !isPrimaryUp
. But I'm not sure if that means the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MASTER check was already there but in the form of a boolean (isMasterUpdate).
I can break this into two separate if blocks instead of an if .. if .. else
if that is easier to read/understand. What you are suggesting will be equivalent to how it is written today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern was the deep nesting which made it non-obvious. A cascading if (or switch) may read better. Try it. If it doesn't improve it, we can keep this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think the switch made it better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving, in case no further improvements are possible.
go/vt/discovery/healthcheck.go
Outdated
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern was the deep nesting which made it non-obvious. A cascading if (or switch) may read better. Try it. If it doesn't improve it, we can keep this as is.
Signed-off-by: deepthi <[email protected]>
hc.healthy[targetKey][0].MasterTermStartTime) | ||
} else { | ||
// Just replace it. | ||
hc.healthy[targetKey][0] = th | ||
} | ||
} | ||
case isPrimary && !isPrimaryUp: | ||
// No healthy master tablet | ||
hc.healthy[targetKey] = []*TabletHealth{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense. This should be fixed.
Description
There are error conditions from a tablet health check that require the healthy tablet list to be updated.
Related Issue(s)
Fixes #7472
Fixes #7177
Checklist
Impacted Areas in Vitess
Components that this PR will affect:
Testing
Unit tests were added for each case. For #7472 I also followed the provided testing procedure to reproduce the problem and confirmed that the fix works.