Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nomad: ensure a unique ClusterID exists when leader (gh-6702) #6707

Merged
merged 1 commit into from
Dec 9, 2019

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Nov 14, 2019

As leader, check to see if a ClusterID has already been generated
and replicated through the raft log. If one has not been generated,
generate a UUID and push it through. If a ClusterID has been generated,
its value is applied into the state store. The value is not actually
used anywhere in this changeset, but is a prerequisite for gh-6701.

Similar prior art in consul.

Copy link
Contributor

@drewbailey drewbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a helper func to generate test uuids here if you want https://github.com/hashicorp/nomad/blob/master/helper/uuid/uuid.go#L9

@shoenig shoenig changed the base branch from master to dev-connect-acls November 18, 2019 14:25
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers.

We should also expose this somewhere. We've had at least one prod incident where somewhere accidentally joined two distinct clusters that had the same region name. Even if we don't immediately prevent that issue with this ID, it could still make debugging easier.

Mind filing an issue if you think that makes sense?

nomad/fsm_test.go Outdated Show resolved Hide resolved
nomad/leader.go Outdated Show resolved Hide resolved
nomad/state/state_store.go Show resolved Hide resolved
nomad/structs/structs.go Outdated Show resolved Hide resolved
Copy link
Member

@nickethier nickethier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I agree exposing this somewhere might help some engineer someday debug a cluster.

Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good - but would love to handle the case of mixed cluster better. thanks!

nomad/leader.go Outdated Show resolved Hide resolved
nomad/state/schema_test.go Outdated Show resolved Hide resolved
nomad/state/state_store.go Outdated Show resolved Hide resolved
@shoenig shoenig force-pushed the f-cluster-id branch 4 times, most recently from 1ce3712 to 8a1f715 Compare December 3, 2019 15:09
@shoenig shoenig force-pushed the f-cluster-id branch 6 times, most recently from e67e13a to 533354c Compare December 4, 2019 22:19
@shoenig
Copy link
Member Author

shoenig commented Dec 5, 2019

Hey all, thanks for looking so closely at this! I've made a bunch of changes, hopefully addressing everything pointed out before. In particular we should be resilient against leadership changes during upgrades now. The intended invariant is that any Server can call ClusterID and get back either the agreed upon UUID, or not ready yet error. Since the implementation is quite different now please have another look if you can!

nomad/leader.go Outdated Show resolved Hide resolved
nomad/leader_test.go Outdated Show resolved Hide resolved
nomad/server.go Outdated Show resolved Hide resolved
nomad/state/schema.go Show resolved Hide resolved
nomad/state/schema_test.go Outdated Show resolved Hide resolved
nomad/leader.go Outdated Show resolved Hide resolved
Enable any Server to lookup the unique ClusterID. If one has not been
generated, and this node is the leader, generate a UUID and attempt to
apply it through raft.

The value is not yet used anywhere in this changeset, but is a prerequisite
for gh-6701.
@shoenig shoenig merged commit 13b46bc into dev-connect-acls Dec 9, 2019
@shoenig shoenig deleted the f-cluster-id branch December 9, 2019 16:43
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants