Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver: TestLeaseExpirationBasedDrainTransferWithExtension failed #104185

Closed
cockroach-teamcity opened this issue Jun 1, 2023 · 8 comments
Closed
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team X-unactionable This was closed because it was unactionable.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jun 1, 2023

kv/kvserver.TestLeaseExpirationBasedDrainTransferWithExtension failed with artifacts on release-23.1 @ 9a32b64ebd8aeb68364bb3d85a2fa9220b87032f:

=== RUN   TestLeaseExpirationBasedDrainTransferWithExtension
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestLeaseExpirationBasedDrainTransferWithExtension2981786828
    test_log_scope.go:79: use -show-logs to present logs inline
    client_replica_test.go:1805: condition failed to evaluate within 3m45s: [NotLeaseHolderError] lease held by different store; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616892.475920713,0 pro=1685616886.475920713,0
    panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestLeaseExpirationBasedDrainTransferWithExtension2981786828
--- FAIL: TestLeaseExpirationBasedDrainTransferWithExtension (232.47s)

Parameters: TAGS=bazel,gss,race

Help

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-28395

@cockroach-teamcity cockroach-teamcity added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team labels Jun 1, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Jun 1, 2023
@irfansharif
Copy link
Contributor

Same as #100420 which we lost artifacts for.

@irfansharif
Copy link
Contributor

I230601 10:51:02.738438 846479 kv/kvserver/store.go:1818 â‹® [drain] 237  waiting for 1 replicas to transfer their lease away
W230601 10:51:03.582432 843151 kv/kvserver_test/client_replica_test.go:1501 ⋮ [-] 238  [NotLeaseHolderError] ‹lease held by different store›; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616670.475890687,0 pro=1685616664.475890687,0
I230601 10:51:03.810090 846479 kv/kvserver/store.go:1818 â‹® [drain] 239  waiting for 1 replicas to transfer their lease away
I230601 10:51:04.481072 845659 kv/kvserver/replica_proposal.go:276 ⋮ [T1,n2,s2,r64/2:‹/System{\x00aaa-test…-/NodeLive…}›,raft] 240  new range lease repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616673.476596909,0 pro=1685616667.476596909,0 following repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616670.475890687,0 pro=1685616664.475890687,0
W230601 10:51:04.592107 843151 kv/kvserver_test/client_replica_test.go:1501 ⋮ [-] 241  [NotLeaseHolderError] ‹lease held by different store›; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616673.476596909,0 pro=1685616667.476596909,0
I230601 10:51:04.592323 843151 testutils/soon.go:41 ⋮ [-] 242  SucceedsSoon: [NotLeaseHolderError] ‹lease held by different store›; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616673.476596909,0 pro=1685616667.476596909,0
I230601 10:51:04.892840 846479 kv/kvserver/store.go:1818 â‹® [drain] 243  waiting for 1 replicas to transfer their lease away
W230601 10:51:05.343921 843163 server/node.go:1075 ⋮ [T1,n1,summaries] 244  health alerts detected: {Alerts:[{StoreID:1 Category:‹METRICS› Description:‹ranges.underreplicated› Value:63}]}
W230601 10:51:05.593399 843151 kv/kvserver_test/client_replica_test.go:1501 ⋮ [-] 245  [NotLeaseHolderError] ‹lease held by different store›; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616673.476596909,0 pro=1685616667.476596909,0
I230601 10:51:05.593601 843151 testutils/soon.go:41 ⋮ [-] 246  SucceedsSoon: [NotLeaseHolderError] ‹lease held by different store›; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616673.476596909,0 pro=1685616667.476596909,0
I230601 10:51:05.929797 846479 kv/kvserver/store.go:1818 â‹® [drain] 247  waiting for 1 replicas to transfer their lease away
W230601 10:51:06.482761 846479 kv/kvserver/store.go:1834 ⋮ [drain] 248  unable to drain cleanly within 5s (cluster setting ‹server.shutdown.lease_transfer_wait›), service might briefly deteriorate if the node is terminated: waiting for 1 replicas to transfer their lease away

I wonder if this is just #74691.

@cockroach-teamcity
Copy link
Member Author

kv/kvserver.TestLeaseExpirationBasedDrainTransferWithExtension failed with artifacts on release-23.1 @ e84e2cabef32a59d88041132f059a6768dfe56bb:

=== RUN   TestLeaseExpirationBasedDrainTransferWithExtension
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestLeaseExpirationBasedDrainTransferWithExtension4150766469
    test_log_scope.go:79: use -show-logs to present logs inline
    client_replica_test.go:1805: condition failed to evaluate within 3m45s: [NotLeaseHolderError] lease held by different store; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685699043.936400748,0 exp=1685699275.639560687,0 pro=1685699269.639560687,0
    panic.go:540: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/33e1d369c27b9c01b2b6009c561815a3/logTestLeaseExpirationBasedDrainTransferWithExtension4150766469
--- FAIL: TestLeaseExpirationBasedDrainTransferWithExtension (232.86s)

Parameters: TAGS=bazel,gss,race

Help

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

@irfansharif
Copy link
Contributor

How come n2 keeps extending its lease despite being drained?

I230601 10:51:01.483021 846479 kv/kvserver/store.go:1818 â‹® [drain] 209  waiting for 1 replicas to transfer their lease away
I230601 10:51:02.738438 846479 kv/kvserver/store.go:1818 â‹® [drain] 237  waiting for 1 replicas to transfer their lease away
W230601 10:51:03.582432 843151 kv/kvserver_test/client_replica_test.go:1501 ⋮ [-] 238  [NotLeaseHolderError] ‹lease held by different store›; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616670.475890687,0 pro=1685616664.475890687,0
I230601 10:51:03.810090 846479 kv/kvserver/store.go:1818 â‹® [drain] 239  waiting for 1 replicas to transfer their lease away
I230601 10:51:04.481072 845659 kv/kvserver/replica_proposal.go:276 ⋮ [T1,n2,s2,r64/2:‹/System{\x00aaa-test…-/NodeLive…}›,raft] 240  new range lease repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616673.476596909,0 pro=1685616667.476596909,0 following repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616670.475890687,0 pro=1685616664.475890687,0
W230601 10:51:04.592107 843151 kv/kvserver_test/client_replica_test.go:1501 ⋮ [-] 241  [NotLeaseHolderError] ‹lease held by different store›; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616673.476596909,0 pro=1685616667.476596909,0
I230601 10:51:04.592323 843151 testutils/soon.go:41 ⋮ [-] 242  SucceedsSoon: [NotLeaseHolderError] ‹lease held by different store›; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616673.476596909,0 pro=1685616667.476596909,0
I230601 10:51:04.892840 846479 kv/kvserver/store.go:1818 â‹® [drain] 243  waiting for 1 replicas to transfer their lease away
W230601 10:51:06.482761 846479 kv/kvserver/store.go:1834 ⋮ [drain] 248  unable to drain cleanly within 5s (cluster setting ‹server.shutdown.lease_transfer_wait›), service might briefly deteriorate if the node is terminated: waiting for 1 replicas to transfer their lease away
I230601 10:51:07.981754 845657 kv/kvserver/replica_proposal.go:276 ⋮ [T1,n2,s2,r64/2:‹/System{\x00aaa-test…-/NodeLive…}›,raft] 253  new range lease repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616676.976193499,0 pro=1685616670.976193499,0 following repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616673.476596909,0 pro=1685616667.476596909,0
I230601 10:51:11.481384 845656 kv/kvserver/replica_proposal.go:276 ⋮ [T1,n2,s2,r64/2:‹/System{\x00aaa-test…-/NodeLive…}›,raft] 261  new range lease repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616680.476488020,0 pro=1685616674.476488020,0 following repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616676.976193499,0 pro=1685616670.976193499,0
I230601 10:51:14.982889 845666 kv/kvserver/replica_proposal.go:276 ⋮ [T1,n2,s2,r64/2:‹/System{\x00aaa-test…-/NodeLive…}›,raft] 270  new range lease repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616683.976764456,0 pro=1685616677.976764456,0 following repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616680.476488020,0 pro=1685616674.476488020,0
I230601 10:51:18.481338 845664 kv/kvserver/replica_proposal.go:276 ⋮ [T1,n2,s2,r64/2:‹/System{\x00aaa-test…-/NodeLive…}›,raft] 279  new range lease repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616687.476501307,0 pro=1685616681.476501307,0 following repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616683.976764456,0 pro=1685616677.976764456,0
I230601 10:51:21.981224 845659 kv/kvserver/replica_proposal.go:276 ⋮ [T1,n2,s2,r64/2:‹/System{\x00aaa-test…-/NodeLive…}›,raft] 298  new range lease repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616690.975889210,0 pro=1685616684.975889210,0 following repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616687.476501307,0 pro=1685616681.476501307,0
I230601 10:51:24.625420 843151 testutils/soon.go:41 ⋮ [-] 304  SucceedsSoon: [NotLeaseHolderError] ‹lease held by different store›; r64: replica (n1,s1):1 not lease holder; current lease is repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616690.975889210,0 pro=1685616684.975889210,0
I230601 10:51:24.981039 845656 kv/kvserver/replica_proposal.go:276 ⋮ [T1,n2,s2,r64/2:‹/System{\x00aaa-test…-/NodeLive…}›,raft] 305  new range lease repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616693.976232924,0 pro=1685616687.976232924,0 following repl=(n2,s2):2 seq=2 start=1685616661.226220253,0 exp=1685616690.975889210,0 pro=1685616684.975889210,0

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jun 5, 2023

With #100430, the Raft scheduler will eagerly keep extending any local expiration-based leases. That shouldn't prevent the allocator from transferring the lease away though. It is similar to how we keep extending all epoch leases by heartbeating even though we're draining, and is necessary to keep ranges available during drains.

@pav-kv
Copy link
Collaborator

pav-kv commented Jun 13, 2023

14012 runs so far, 0 failures, over 47m50s

@pav-kv
Copy link
Collaborator

pav-kv commented Jun 16, 2023

@irfansharif Do you happen to still have the artifacts?

@kvoli kvoli added the X-unactionable This was closed because it was unactionable. label Jun 21, 2023
@kvoli
Copy link
Collaborator

kvoli commented Jun 21, 2023

No artifacts, doesn't reproduce under --stress --race. Closing.

@kvoli kvoli closed this as completed Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-kv KV Team X-unactionable This was closed because it was unactionable.
Projects
None yet
Development

No branches or pull requests

5 participants