-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server,storage,sql: record node {de,re}commissioning in the event log #18178
Conversation
464cf70
to
0d72faa
Compare
Reviewed 11 of 11 files at r1. Comments from Reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice solution! LGTM, just a few small comments.
pkg/server/server.go
Outdated
ctx, txn, eventType, int32(nodeID), int32(nodeID), struct{}{}, | ||
) | ||
}); err != nil { | ||
log.Errorf(ctx, "unable to record commissioning event for node %d: %s", nodeID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but it'd be nice to include the event type here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
pkg/ui/src/util/eventTypes.ts
Outdated
@@ -33,11 +33,16 @@ export const FINISH_SCHEMA_CHANGE = "finish_schema_change"; | |||
export const NODE_JOIN = "node_join"; | |||
// Recorded when an existing node rejoins the cluster after being offline. | |||
export const NODE_RESTART = "node_restart"; | |||
// Recorded when a node is marked as decommissioning. | |||
export const NODE_DECOMMISSIONED = "node_decommissioned"; | |||
// EventLogNodeRecommissioned is recorded when a decommissioned node is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/EventLogNodeRecommissioned is recorded/Recorded/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done.
pkg/server/server.go
Outdated
// than to slow down future node liveness transactions. | ||
if err := s.db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error { | ||
return eventLogger.InsertEventRecord( | ||
ctx, txn, eventType, int32(nodeID), int32(nodeID), struct{}{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reporting
argument shouldn't be the nodeID
being commissioned, it should be s.NodeID()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Done.
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. pkg/server/server.go, line 1172 at r1 (raw file):
There's also the problem that intents on the node liveness range are not allowed. Not 100% sure that's true, though I think I remember it. pkg/ui/src/views/cluster/containers/events/index.tsx, line 110 at r1 (raw file):
The message seems oddly repetitive, but apparently that's true for its neighbors as well. Comments from Reviewable |
0d72faa
to
ecc8dd2
Compare
Review status: 8 of 10 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending. pkg/server/server.go, line 1172 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Is that enforced somewhere other than this block of code? cockroach/pkg/storage/node_liveness.go Lines 647 to 660 in d65ed58
pkg/ui/src/views/cluster/containers/events/index.tsx, line 110 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Yeah, I find the event log in its current form to be rather useless, but ¯_(ツ)_/¯ Comments from Reviewable |
Review status: 8 of 10 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. pkg/server/server.go, line 1172 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
Yep, that's the location. Comments from Reviewable |
ecc8dd2
to
2da77e9
Compare
Ok, merging before someone touches embedded.go! Review status: 8 of 11 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful. pkg/server/server.go, line 1172 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Let me know if you have a rewording for this comment, @tschottdorf. It seems correct to me, even with the above comment in mind. Comments from Reviewable |
Review status: 8 of 11 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful. pkg/server/server.go, line 1172 at r1 (raw file): Previously, benesch (Nikhil Benesch) wrote…
It's incorrect in that it suggests that it would slow down node liveness when in fact it would break it. But don't think that matters here. Comments from Reviewable |
The new bits of the test introduced in cockroachdb#18178 were flaky since RPCs may be sent to the down nodes.
Fixes #17677.
/cc @a-robinson @tschottdorf can one or both of you review? This wasn't quite as simple as I'd hoped. The naive solution spams the event log with duplicate entries because
./node decommission FOO
fires off many decommissioning requests in quick succession.Also, note that
TestDecommission
is currentlyt.Skip
ped, so the test won't run until #17995 is fixed. (I did manually verify my addition by replacing thedecommission --wait all
that hangs forever with adecommission --wait none
and verifying that theCheckQueryResults
succeeds.)