-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: transfer-leases/signal failed #101624
Comments
The lease was transferred away, but it wasn't applied on the new nodes before the check completed:
Here, This test seems inherently racy, in that we have to wait for one of the other nodes to apply the lease, but not wait long enough that the old leases expire and the remaining nodes reacquire them on their own. One way to do this would be to change the test to set a very long lease expiration time (e.g. 10 minutes), and then poll the other nodes for e.g. 1 minute to see if they applied the lease. |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 9145535e8f6e57d09e3688d6b95fbdceecc47194:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 737f3bb5dbadc17ed5e924afaef3ec06e45bc017:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ afe2d46600257f6c022ce89168bebdc0023c8215:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ cfc200070bbf9fc4b6bc6c6556b5fd56f48db381:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ eb2447deef5f6c1807bd51c84943cb47341e9046:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 8a2050b00bac7e247df26c63aa0c8c1ca4fd7996:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 8a2050b00bac7e247df26c63aa0c8c1ca4fd7996:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 8a2050b00bac7e247df26c63aa0c8c1ca4fd7996:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 33b991317b3f3590581424d85aca1c939d21a797:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 8055184fc991424d1d0eefd7ccf703948b8de3ee:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 2124c6b9adc61030bedc4a6a67c025566f989201:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ e36a88e4bd22dae1108e1276d6a4cdc680bc6a1b:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 20d9e47509e8136096d133031bbd5f110962ba6a:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ 9a48060e50978c78e264fe96bab563ec6e20e3ae:
Parameters: |
We see that we're not transferring leases because both followers are in
Looking at the Raft logs for r32 on n3, we see n3 sending a Raft heartbeat at 13:35:30. The send fails to n2 because the node was in fact down, transitioning it to
Well, turns out etcd/raft doesn't even try if it knows the follower is caught up: As usual, @tbg is way ahead of me, and fixed this upstream in etcd-io/raft#52. That change wasn't backported to 22.2. Going to verify with a local cherry-pick. |
cc @cockroachdb/replication |
Can confirm that 30/30 runs passed with etcd-io/raft#52. Before, it would typically fail on the first or second run. Will prepare a backport. |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ f0bda42400b5bcd7a2ccad3611144793a00f18a8:
Parameters: |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ e6e90f623bef2f35a04f1404bb8b4483ecf5729f:
Parameters: |
Resolved by #106805. |
roachtest.transfer-leases/signal failed with artifacts on release-22.2 @ c2975c776afe8ed8b2571f0820cf9f3cb5ef22d0:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_encrypted=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-27069
Epic CRDB-27234
The text was updated successfully, but these errors were encountered: