Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server,storage,sql: record node {de,re}commissioning in the event log #18178

Merged
merged 1 commit into from
Sep 5, 2017

Conversation

benesch
Copy link
Contributor

@benesch benesch commented Sep 2, 2017

Fixes #17677.

/cc @a-robinson @tschottdorf can one or both of you review? This wasn't quite as simple as I'd hoped. The naive solution spams the event log with duplicate entries because ./node decommission FOO fires off many decommissioning requests in quick succession.

Also, note that TestDecommission is currently t.Skipped, so the test won't run until #17995 is fixed. (I did manually verify my addition by replacing the decommission --wait all that hangs forever with a decommission --wait none and verifying that the CheckQueryResults succeeds.)

@benesch benesch requested review from a team September 2, 2017 04:42
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@benesch benesch force-pushed the commissioning-events branch from 464cf70 to 0d72faa Compare September 2, 2017 05:10
@bdarnell
Copy link
Contributor

bdarnell commented Sep 3, 2017

:lgtm:


Reviewed 11 of 11 files at r1.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

Copy link
Contributor

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution! LGTM, just a few small comments.

ctx, txn, eventType, int32(nodeID), int32(nodeID), struct{}{},
)
}); err != nil {
log.Errorf(ctx, "unable to record commissioning event for node %d: %s", nodeID, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but it'd be nice to include the event type here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -33,11 +33,16 @@ export const FINISH_SCHEMA_CHANGE = "finish_schema_change";
export const NODE_JOIN = "node_join";
// Recorded when an existing node rejoins the cluster after being offline.
export const NODE_RESTART = "node_restart";
// Recorded when a node is marked as decommissioning.
export const NODE_DECOMMISSIONED = "node_decommissioned";
// EventLogNodeRecommissioned is recorded when a decommissioned node is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/EventLogNodeRecommissioned is recorded/Recorded/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.

// than to slow down future node liveness transactions.
if err := s.db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {
return eventLogger.InsertEventRecord(
ctx, txn, eventType, int32(nodeID), int32(nodeID), struct{}{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reporting argument shouldn't be the nodeID being commissioned, it should be s.NodeID().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Done.

@tbg
Copy link
Member

tbg commented Sep 5, 2017

:lgtm:


Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/server/server.go, line 1172 at r1 (raw file):

			// If we die right now or if this transaction fails to commit, the
			// commissioning event will not be recorded to the event log. While we
			// could insert the event record in the same transaction as the liveness

There's also the problem that intents on the node liveness range are not allowed. Not 100% sure that's true, though I think I remember it.


pkg/ui/src/views/cluster/containers/events/index.tsx, line 110 at r1 (raw file):

      break;
    case eventTypes.NODE_DECOMMISSIONED:
      content = <span>Node Decommissioned: Node {targetId} was decommissioned</span>;

The message seems oddly repetitive, but apparently that's true for its neighbors as well.


Comments from Reviewable

@benesch benesch force-pushed the commissioning-events branch from 0d72faa to ecc8dd2 Compare September 5, 2017 13:41
@benesch
Copy link
Contributor Author

benesch commented Sep 5, 2017

Review status: 8 of 10 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.


pkg/server/server.go, line 1172 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

There's also the problem that intents on the node liveness range are not allowed. Not 100% sure that's true, though I think I remember it.

Is that enforced somewhere other than this block of code?

// Use a trigger on EndTransaction to indicate that node liveness should
// be re-gossiped. Further, require that this transaction complete as a
// one phase commit to eliminate the possibility of leaving write intents.
b.AddRawRequest(&roachpb.EndTransactionRequest{
Commit: true,
Require1PC: true,
InternalCommitTrigger: &roachpb.InternalCommitTrigger{
ModifiedSpanTrigger: &roachpb.ModifiedSpanTrigger{
NodeLivenessSpan: &roachpb.Span{
Key: key,
EndKey: key.Next(),
},
},
},


pkg/ui/src/views/cluster/containers/events/index.tsx, line 110 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

The message seems oddly repetitive, but apparently that's true for its neighbors as well.

Yeah, I find the event log in its current form to be rather useless, but ¯_(ツ)_/¯


Comments from Reviewable

@tbg
Copy link
Member

tbg commented Sep 5, 2017

Review status: 8 of 10 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


pkg/server/server.go, line 1172 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

Is that enforced somewhere other than this block of code?

// Use a trigger on EndTransaction to indicate that node liveness should
// be re-gossiped. Further, require that this transaction complete as a
// one phase commit to eliminate the possibility of leaving write intents.
b.AddRawRequest(&roachpb.EndTransactionRequest{
Commit: true,
Require1PC: true,
InternalCommitTrigger: &roachpb.InternalCommitTrigger{
ModifiedSpanTrigger: &roachpb.ModifiedSpanTrigger{
NodeLivenessSpan: &roachpb.Span{
Key: key,
EndKey: key.Next(),
},
},
},

Yep, that's the location.


Comments from Reviewable

@benesch benesch force-pushed the commissioning-events branch from ecc8dd2 to 2da77e9 Compare September 5, 2017 14:27
@benesch
Copy link
Contributor Author

benesch commented Sep 5, 2017

Ok, merging before someone touches embedded.go!


Review status: 8 of 11 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 1172 at r1 (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

Yep, that's the location.

Let me know if you have a rewording for this comment, @tschottdorf. It seems correct to me, even with the above comment in mind.


Comments from Reviewable

@benesch benesch merged commit 2f476ff into cockroachdb:master Sep 5, 2017
@benesch benesch deleted the commissioning-events branch September 5, 2017 15:40
@tbg
Copy link
Member

tbg commented Sep 5, 2017

Review status: 8 of 11 files reviewed at latest revision, 4 unresolved discussions, all commit checks successful.


pkg/server/server.go, line 1172 at r1 (raw file):

Previously, benesch (Nikhil Benesch) wrote…

Let me know if you have a rewording for this comment, @tschottdorf. It seems correct to me, even with the above comment in mind.

It's incorrect in that it suggests that it would slow down node liveness when in fact it would break it. But don't think that matters here.
See #14250 (comment) for archaeology.


Comments from Reviewable

tbg added a commit to tbg/cockroach that referenced this pull request Sep 6, 2017
The new bits of the test introduced in
cockroachdb#18178 were flaky since RPCs may be
sent to the down nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ui: add de / re commissioning nodes to event log
5 participants