-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: support raft learner in etcd #10645
Conversation
Codecov Report
@@ Coverage Diff @@
## master #10645 +/- ##
==========================================
- Coverage 71.8% 71.66% -0.15%
==========================================
Files 393 393
Lines 36628 36875 +247
==========================================
+ Hits 26302 26425 +123
- Misses 8502 8599 +97
- Partials 1824 1851 +27
Continue to review full report at Codecov.
|
) | ||
|
||
type Cluster interface { | ||
// MemberList lists the current cluster membership. | ||
MemberList(ctx context.Context) (*MemberListResponse, error) | ||
|
||
// MemberAdd adds a new member into the cluster. | ||
MemberAdd(ctx context.Context, peerAddrs []string) (*MemberAddResponse, error) | ||
MemberAdd(ctx context.Context, peerAddrs []string, isLearner bool) (*MemberAddResponse, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing |
@jingyih Can you improve the commit history a little bit? Right now they are 60 commits. For example separate them into 3-5 commits that can be reviewed independently? Thanks! |
I agree there are a lot of commits. Half of these commits are merge commits, which can be ignored. What happened was we first created a feature branch, and all the code changes were first merged into the feature branch. There are about 30 PRs opened against feature branch, they closely match the items in task list #10537. Reviewing 30 commits is not easy, rebasing and grouping them into 3 to 5 commits is not easy neither. I can pick the most important commits (maybe 15 to 20 commits) and list them in groups. Is this acceptable? |
@jingyih OK. Let us get rid of the merge commits first. |
- Added isLearner flag to MemberAddRequest in Cluster API. - Added isLearner field to StatusResponse in Maintenance API. - Added MemberPromote rpc to Cluster API.
Fixed compilation erros after API change for learner.
Added IsLearner field to etcdserver internal Member type. Routed learner MemberAdd request from server API to raft. Apply learner MemberAdd result to server after the request is passed through Raft.
Added IsLearner flag to clientv3 MemberAdd API.
Added support for "etcdctl member add --learner".
Added an e2e test to exercise "etcdctl member add --learner".
Added learner field to endpoint status API.
Hardcoded allowed rpc for learner node. Added filtering in grpc interceptor to check if rpc is allowed for learner node.
Adding TestKVForLearner. Also adding test utility functions for clientv3 integration tests.
1. Maintenance API MoveLeader() returns ErrBadLeaderTransferee if transferee does not exist or is raft learner. 2. etcdserver TransferLeadership() only choose voting member as transferee.
Adding integration test TestMoveLeaderToLearnerError, which ensures that leader transfer to learner member will fail.
Fixes TestMemberAddForLearner and TestMemberPromoteForLearner.
Adding delay in the test for the newly started learner member to catch up applying config change entries in raft log.
Adjust StrictReconfigCheck logic to accommodate learner members in the cluster.
Make learner return code.Unavailable when the request is not supported by learner. Client balancer will retry a different endpoint.
If member does not exist in cluster, IsLearner will panic.
Hard-coded the maximum number of learners to 1.
Use ReadyNotify instead of time.Sleep to wait for server ready.
If learner is not ready to be promoted, use etcdserver.ErrLearnerNotReady instead of using membership.ErrLearnerNotReady.
Check http StatusCode. Only Unmarshal body if StatusCode is statusOK.
Update TestMemberPromote to include both learner not-ready and learner ready test cases. Removed unit test TestPromoteMember, it requires underlying raft node to be started and running. The member promote is covered by the integration test.
@@ -51,10 +53,13 @@ type Member struct { | |||
|
|||
// NewMember creates a Member without an ID and generates one based on the | |||
// cluster name, peer URLs, and time. This is used for bootstrapping/adding new member. | |||
func NewMember(name string, peerURLs types.URLs, clusterName string, now *time.Time) *Member { | |||
func NewMember(name string, peerURLs types.URLs, clusterName string, now *time.Time, isLearner bool) *Member { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably add a func called NewLearnerMember. Boolean arg on public method is not good in general.
@jingyih I reviewed the first 7 commits (up to etcdctl: add learner field in member list output). Can you make a separate PR for these 7 commits? I think they can be merged separately first. |
@xiang90 But do we want to merge only part of the learner commits into master? I though it might cause confusion? |
clientv3/cluster.go
Outdated
@@ -17,10 +17,11 @@ package clientv3 | |||
import ( | |||
"context" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the import format needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. FYI, it is not in the final version. "error" is imported here temporarily because the MemberPromote API returns error "not implemented". The "error" import is removed when the actual implementation is done.
etcdserver/api/membership/cluster.go
Outdated
@@ -252,6 +252,16 @@ func (c *RaftCluster) Recover(onSet func(*zap.Logger, *semver.Version)) { | |||
} | |||
} | |||
|
|||
// IsPromoteChange checks if m is a promoteChange | |||
func (c *RaftCluster) IsPromoteChange(m *Member) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsMemberPromoted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function checks if the config change is for promoting a member (as opposed to adding a member). Maybe use IsMemberPromoteConfChange
or IsConfChangeForMemberPromote
?
etcdserver/api/membership/cluster.go
Outdated
@@ -252,6 +252,16 @@ func (c *RaftCluster) Recover(onSet func(*zap.Logger, *semver.Version)) { | |||
} | |||
} | |||
|
|||
// IsPromoteChange checks if m is a promoteChange | |||
func (c *RaftCluster) IsPromoteChange(m *Member) bool { | |||
members, _ := membersFromStore(c.lg, c.v2store) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to load member from c.v2store, not from c.members directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Members info from store is more "accurate" then c.members. In most cases they are the same, except during server bootstrapping.
During server bootstrapping, we construct members information from "--init-cluster". The resulting members info does not have any information on member's learner flag. Because we do not provide a learner flag when starting etcd binary.
Also check here, in current code we load members from store when validating the conf change in apply stage, it is for very similar reasons.
etcd/etcdserver/api/membership/cluster.go
Lines 255 to 256 in a44a281
func (c *RaftCluster) ValidateConfigurationChange(cc raftpb.ConfChange) error { | |
members, removed := membersFromStore(c.lg, c.v2store) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should be more specific on this topic. Consider the case where we add a new member to an existing cluster.
- Call MemberAdd API to add new member to existing members in cluster.
1.1 Each existing member append confchange raft entry of adding the new member.
1.2 Each existing member apply the raft entry which adds the new member to itsc.members
and backend storage. - When starting the new member (during bootstrapping), new member's
c.members
is initially constructed from the input parameter "--init-cluster", whereas initally the backend storage has no members information. At this point, new member'sc.members
is almost accurate except that it does not have info on learner. - As the new member getting raft entries from leader and applying them, it will eventually apply all the confchange raft entries, including the entry that adds itself to the cluster. At this point, its
c.members
is accurate.
During bootstrapping, apply stage should always refer to member info form backend storage. Otherwise server gets confused when applying the confchange raft entries: adding a member which already exists in its c.members
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, just realized this function is not part of the final version. We had to come up with a better way of deciding (in apply stage) if a confchange is for adding new member, or for promoting an existing member.
https://github.com/jingyih/etcd/blob/b433162c0b9ea4d13e9a0dd49b1f51e20896cbcc/etcdserver/server.go#L2232
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. then probably replace this part of the PR with the final version :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be removed in later commits. You will not see this function in part2 PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
@jingyih Can you create a part2 PR? I feel the next 10 commits are good candidates. |
I can do it tomorrow. |
@jingyih Probably the remaining commits can be easily reviewed in one PR? |
Yeah I am creating the last part - part 3. |
Edit: This PR is replaced by #10725, #10727, #10730.
This PR adds support for learner (non-voting) member in etcd. Currently the basic functionality is done. Feels like this might be a good time to get reviews before the changes grow bigger. Please refer to the pull requests against learner branch for incremental changes and their context.
Background
Status
The implementation status is tracked in task list #10537.
Main contributors to learner branch:
@jingyih @WIZARD-CXY @gyuho @jpbetz