-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vault: recover from standby losing etcd lease (#3031) #3511
Conversation
Fixes #3031 |
physical/etcd/etcd3.go
Outdated
} | ||
|
||
if needNewSession { | ||
session, err := concurrency.NewSession(c.etcd, concurrency.WithTTL(etcd3LockTimeoutInSeconds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the code to line 273 directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@bhiggins the fix looks right. can you simplify the code? have you tested it manually? if yes, then it looks good to me. |
@bhiggins thanks! |
This change makes these errors transient instead of permanent: [ERROR] core: failed to acquire lock: error=etcdserver: requested lease not found After this change, there can still be one of these errors when a standby vault that lost its lease tries to become leader, but on the next lock acquisition attempt a new session will be created. With this new session, the standby will be able to become the leader.
bb34891
to
68b1abc
Compare
Simplified code as requested. Yes, I tested this patch against 0.7.3 by having three vaults running in HA config. I manually deleted the standby's leases out of etcd, and then killed the leader. After that, as described in the commit message, the "lease not found" error occurs once but then on the next lock attempt a new session is created and the standby becomes the leader. |
lgtm. defer to @jefferai |
Hah, it's the other way, when it comes to etcd3 I defer to @xiang90 :-D Merging. |
This change makes these errors transient instead of permanent:
[ERROR] core: failed to acquire lock: error=etcdserver: requested lease not found
After this change, there can still be one of these errors when a
standby vault that lost its lease tries to become leader, but on the
next lock acquisition attempt a new session will be created. With this
new session, the standby will be able to become the leader.