-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new flag to persist lease expiry #9526
Conversation
i would rather make this per lease basis instead of a global flag of the server. |
Our current use case for this change is one where we would not want a mixture of expiry modes. We currently do not envision that we would have one in the future, and hesitate to speculate on what that use case might be. Working on a per lease basis would also require a client side change, which is a larger commitment than we're looking to make at this point. Additionally we would need to find an alternative way to address the performance issue in the second bullet point. If you all think that a configuration flag isn't a reasonable way to do this, we're certainly willing to think on it and decide if we want to come back to it later. |
@jcalvert Persisting leases has a huge cost when you have a lot of keep-alived leases. For short leases, I doubt there is a need to persist them. For long leases, maybe there is a use case. |
@xiang90 Our use case is a very large number(10 million) of long lived leases that never receive a keep alive, as we use this as a TTL. This helps us avoid needing to embed timestamps and do checking in our client side code. We would be happy to explain more about this usecase if necessary! |
…leases Add a persist expiry flag that allows the expiry field of leases to be persisted along with the ID and ttl. Defaults to false. Addresses etcd-io#9496
e7e3588
to
4f56765
Compare
this is not really what current lease designs for. it designs for using short term timeout to indicate liveness. the best way to handle your use case is to add persistency at per lease basis. |
Hi @xiang90 - I wanted to let you know that on our team here, we've been discussing #9699 and still actively pursuing on getting that pull request merged eventually. What we ultimately decide to do with this particular PR (#9526) may depend on the outcome of #9699 and so if it's okay with you, we wanted to leave this PR opened. Can you please let us know if you'd like us to take an action on this PR soon? Thank you. |
this is already fixed by #9924 |
This pull request provides an opt-in configuration flag to persist lease expiry. This is set up as a flag as the extra overhead would be undesirable for a cluster where leases are frequently renewed, as this does add persist overhead there. However, in some cases where the extra network/disk traffic is acceptable, this solves two problems:
In the case of using Etcd for distributed locking utilizing leases, a leader election currently resets the expiry since it is only held in memory. If a crashed resource held a lock while a leader election occurs (eg during a cluster upgrade) that resource could be held for up to twice as long as the specified TTL. This creates unexpected behavior and necessitates implementation of staleness checking in application code that uses etcd.
Addressing the issue in Leader promotions exceed default timeout values when etcd is holding many leases #9496 - when there are large numbers of leases (~10 million), the algorithm to prevent lease expiration pile-up upon
Promote
has significant time cost that can lead to timeouts, as it has to iterate over all leases. By persisting this expiry value, the current distribution of time remaining is maintained and pile-up will not occur.This is achieved by adding the existing
expiry
on thelease
type to the protobuf file for leases, and then when the flag is true, persisting the lease when that value updates(Grant
,Renew
), if and only if the server is configured to do so.coauthored with @cosgroveb