-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate cluster attributes to use v3 backend #11427
Conversation
Added ClusterVersionSetRequest for setting cluster version via v3 apply. Added ClusterMemberAttrSetRequest for setting clsuter member attributes via v3 apply.
Codecov Report
@@ Coverage Diff @@
## master #11427 +/- ##
==========================================
- Coverage 64.63% 64.6% -0.03%
==========================================
Files 403 403
Lines 37998 38070 +72
==========================================
+ Hits 24560 24596 +36
- Misses 11802 11843 +41
+ Partials 1636 1631 -5
Continue to review full report at Codecov.
|
tests/e2e/ctl_v3_migrate_test.go
Outdated
if resp.Kvs[0].CreateRevision != 7 { | ||
t.Fatalf("resp.Kvs[0].CreateRevision expected 7, got %d", resp.Kvs[0].CreateRevision) | ||
|
||
if resp.Kvs[0].CreateRevision <= rev { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is CreateRevision == rev + 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
tests/e2e/ctl_v3_migrate_test.go
Outdated
@@ -85,11 +83,22 @@ func TestCtlV3Migrate(t *testing.T) { | |||
if err != nil { | |||
t.Fatal(err) | |||
} | |||
rev := resp.Header.Revision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to revAfterMigrate
for better readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// information is the JSON representation of this server's member struct, updated | ||
// with the static clientURLs of the server. | ||
// The function keeps attempting to register until it succeeds, | ||
// or its server is stopped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO to replace publish() in etcd 3.6?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
To clarify, the intention is to add new internal raft type |
@gyuho Could you help take a look? The main motivation of this PR is that currently cluster version is recovered from v2 store during server restart. Because v2 store is in memory, cluster version information after server restart could be very stale. During cluster downgrade, when server starts and checks the compatibility of its own server version against cluster version, the check could fail if the cluster version info is very stale. This could be very confusing to user. As an example: #11362 (comment) So the plan is to use v3 request and v3 backend for all cluster information updates (except members). |
@@ -893,7 +893,7 @@ func TestKVLargeRequests(t *testing.T) { | |||
expectError error | |||
}{ | |||
{ | |||
maxRequestBytesServer: 1, | |||
maxRequestBytesServer: 256, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, if we keep the previous test data, the test cluster will fail to start. It is because the cluster will make several v3 raft requests like cluster version set during start. Old code will use v2 for starting server which will bypass the check.
Lines 633 to 635 in 1f8764b
if len(data) > int(s.Cfg.MaxRequestBytes) { | |
return nil, ErrRequestTooLarge | |
} |
Changing a larger
maxRequestBytesServer
will not affect the test behavior as long as maxRequestBytesServer
is smaller then valueSize
.
etcdserver/server.go
Outdated
for { | ||
select { | ||
case <-s.stopping: | ||
if lg := s.getLogger(); lg != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update getLogger
to return zap.NewNop()
, so we don't need all these manual nil
checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Will fix other getLogger
in a separate pr.
etcdserver/server.go
Outdated
return | ||
|
||
default: | ||
if lg := s.getLogger(); lg != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. Can we store variable once at top (before for
loop), and use the same logger?
@@ -58,6 +59,9 @@ message InternalRaftRequest { | |||
AuthRoleGetRequest auth_role_get = 1202; | |||
AuthRoleGrantPermissionRequest auth_role_grant_permission = 1203; | |||
AuthRoleRevokePermissionRequest auth_role_revoke_permission = 1204; | |||
|
|||
membershippb.ClusterVersionSetRequest cluster_version_set = 1300; | |||
membershippb.ClusterMemberAttrSetRequest cluster_member_attr_set = 1301; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are unknown fields for older etcd? What happens to the older etcd if it receives this request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I see. These will be ignored in old etcd, and it instead uses v2 API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these two fields made v3.5 to v3.4 downgrade fail.
Can we make it an experimental API in v3.4 and add these two fields to make downgrade from v3.5 to v3.4 possible?
panic: not implemented
goroutine 214 [running]:
go.etcd.io/etcd/etcdserver.(*applierV3backend).Apply(0xc000238468, 0xc0002b2460, 0x0)
/home/chaochn/workplace/EKS-etcd/src/EKS-etcd/etcdserver/apply.go:173 +0xf85
go.etcd.io/etcd/etcdserver.(*authApplierV3).Apply(0xc000242000, 0xc0002b2460, 0x0)
/home/chaochn/workplace/EKS-etcd/src/EKS-etcd/etcdserver/apply_auth.go:60 +0xd2
go.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntryNormal(0xc000290600, 0xc0002d14d8)
/home/chaochn/workplace/EKS-etcd/src/EKS-etcd/etcdserver/server.go:2230 +0x1fb
go.etcd.io/etcd/etcdserver.(*EtcdServer).apply(0xc000290600, 0xc000302a80, 0xa, 0xc, 0xc0001c8000, 0x0, 0xc000046000, 0xc00033f640)
/home/chaochn/workplace/EKS-etcd/src/EKS-etcd/etcdserver/server.go:2144 +0x579
go.etcd.io/etcd/etcdserver.(*EtcdServer).applyEntries(0xc000290600, 0xc0001c8000, 0xc00002c000)
/home/chaochn/workplace/EKS-etcd/src/EKS-etcd/etcdserver/server.go:1396 +0xe5
go.etcd.io/etcd/etcdserver.(*EtcdServer).applyAll(0xc000290600, 0xc0001c8000, 0xc00002c000)
/home/chaochn/workplace/EKS-etcd/src/EKS-etcd/etcdserver/server.go:1120 +0x88
go.etcd.io/etcd/etcdserver.(*EtcdServer).run.func8(0x11b6e30, 0xc0005da040)
/home/chaochn/workplace/EKS-etcd/src/EKS-etcd/etcdserver/server.go:1065 +0x3c
go.etcd.io/etcd/pkg/schedule.(*fifo).run(0xc0003c0180)
/home/chaochn/workplace/EKS-etcd/src/EKS-etcd/pkg/schedule/schedule.go:157 +0xf3
created by go.etcd.io/etcd/pkg/schedule.NewFIFOScheduler
/home/chaochn/workplace/EKS-etcd/src/EKS-etcd/pkg/schedule/schedule.go:70 +0x13b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know which one failed:
- ClusterVersionSetRequest or ClusterMemberAttrSetRequest
ClusterMemberAttrSetRequest
is only called by publishV3, that is not yet called in v3.5. so I assume not this one.
ClusterVersionSetRequest
seems to be called from:
etcd/server/etcdserver/server.go
Line 2431 in f82b5cb
func (s *EtcdServer) monitorVersions() { |
@hexfusion @wenjiaswe @ychen11 shell we postpone enabling monitorVersions()
till etcd-3.6 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I confirmed in v3.4 we haven't yet implemented ClusterVersionSet
API just now. And the raftRequest MustUnmarshal just slipped this through, which is surprising to me.
case r.ClusterVersionSet != nil:
a.s.applyV3Internal.ClusterVersionSet(r.ClusterVersionSet)
default:
panic("not implemented")
so do we want to make a equivalent monitorVersionsV3()
in etcd-3.5? @ptabor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shell we postpone enabling monitorVersions() till etcd-3.6 ?
Yes, this breaks upgrade case, which is not acceptable for 3.5 + 3.5 + 3.4.
Thanks for the catch @chaochn47.
@chaochn47 Can you help making this optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR: #12987
tests/e2e/cluster_test.go
Outdated
@@ -290,6 +290,7 @@ func (cfg *etcdProcessClusterConfig) etcdServerProcessConfigs() []*etcdServerPro | |||
for i := range etcdCfgs { | |||
etcdCfgs[i].initialCluster = strings.Join(initialCluster, ",") | |||
etcdCfgs[i].args = append(etcdCfgs[i].args, initialClusterArgs...) | |||
etcdCfgs[i].args = append(etcdCfgs[i].args, "--logger=zap") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jingyih Should we make zap
default in a separate PR, so we don't make this unrelated change?
tests/e2e/ctl_v2_test.go
Outdated
@@ -289,6 +289,7 @@ func testCtlV2Backup(t *testing.T, snapCount int, v3 bool) { | |||
|
|||
if v3 { | |||
// v3 must lock the db to backup, so stop process | |||
time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need comment on why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in current pr since we keep using v2 publish(). It seems like the cluster needs more time publish using publishV3 which makes TestCtlV2BackupV3
a flaky test. Adding a sleep is to address this.
tests/e2e/etcd_config_test.go
Outdated
@@ -148,6 +150,7 @@ func TestEtcdPeerCNAuth(t *testing.T) { | |||
"--listen-peer-urls", fmt.Sprintf("https://127.0.0.1:%d,https://127.0.0.1:%d", etcdProcessBasePort+i, etcdProcessBasePort+len(peers)+i), | |||
"--initial-advertise-peer-urls", fmt.Sprintf("https://127.0.0.1:%d", etcdProcessBasePort+i), | |||
"--initial-cluster", ic, | |||
"--logger=zap", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, let's make zap
default in a separate PR
etcdserver/api/membership/cluster.go
Outdated
if lg != nil { | ||
lg.Panic( | ||
"unexpected number of keys when getting cluster version from backend", | ||
zap.Int("number fo keys", len(keys)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make zap
field as json
tag as in Go? since it's structured logging, it should be logged as something like "number-of-keys"
or "key"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -1898,7 +1898,7 @@ func TestV3LargeRequests(t *testing.T) { | |||
expectError error | |||
}{ | |||
// don't set to 0. use 0 as the default. | |||
{1, 1024, rpctypes.ErrGRPCRequestTooLarge}, | |||
{256, 1024, rpctypes.ErrGRPCRequestTooLarge}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we explain why we need this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explained above.
#11427 (comment)
Sounds good. Overall approach looks good. |
if c.be != nil { | ||
c.version = clusterVersionFromBackend(c.lg, c.be) | ||
} else { | ||
c.version = clusterVersionFromStore(c.lg, c.v2store) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need fallback on v2 version data if v3 version is nil
(upgraded etcd process)?
c.version = clusterVersionFromStore(c.lg, c.v2store)
if c.be != nil {
v3Ver := clusterVersionFromBackend(c.lg, c.be)
if v3Ver != nil { c.version = v3Ver }
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about it. When will v3 version data is nil while v2 is not? If v3 backend is available, the v3 version will not be nil except starting a new server? Setting version will store new version info into both v2 store and v3 backend.
etcd/etcdserver/api/membership/cluster.go
Lines 571 to 576 in 1f8764b
if c.v2store != nil { | |
mustSaveClusterVersionToStore(c.v2store, ver) | |
} | |
if c.be != nil { | |
mustSaveClusterVersionToBackend(c.be, ver) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that answers my question. Thanks.
5b86d3d
to
b657b78
Compare
etcdserver/server.go
Outdated
ClientUrls: s.attributes.ClientURLs, | ||
}, | ||
} | ||
lg := zap.NewNop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, we still use the zap logger, but a separate function getLogger
should handle the previous if-statements. Can we use the logger from EtcdServer
in this PR, and iterate more to clean up getLogger
method in the separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you. Will update getLogger()
in another PR.
b657b78
to
7784ca8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, great job!
Please update CHANGELOG accordingly.
CHANGELOG-3.5: update for #11427
Related #11380
Things did in this pr:
ClusterVersionSet
ClusterMemberAttrSet
cc @jingyih