Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsm: one-time token expiration should be deterministic #13737

Merged
merged 2 commits into from
Jul 18, 2022

Conversation

tgross
Copy link
Member

@tgross tgross commented Jul 13, 2022

Fixes #13725 for one-time tokens.

When applying a raft log to expire ACL tokens, we need to use a
timestamp provided by the leader so that the result is deterministic
across servers.

When applying a raft log to expire ACL tokens, we need to use a
timestamp provided by the leader so that the result is deterministic
across servers.
@tgross tgross force-pushed the b-ott-expiration-determinism branch from 4bf50f5 to 3efb883 Compare July 13, 2022 13:34
@tgross tgross marked this pull request as ready for review July 13, 2022 13:34
@tgross tgross requested review from lgfa29 and schmichael July 13, 2022 13:34
@tgross tgross added this to the 1.3.x milestone Jul 13, 2022
@@ -12077,6 +12077,7 @@ type OneTimeTokenDeleteRequest struct {

// OneTimeTokenExpireRequest is a request to delete all expired one-time tokens
type OneTimeTokenExpireRequest struct {
Timestamp time.Time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need to get set in CoreScheduler.expiredOneTimeTokenGC? I don't see where this gets set, and core_sched.go is where the other GC operations compute their threshold based on the timetable/local clock before submitting the deletions via raft.

Copy link
Member Author

@tgross tgross Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be set on the ACL.ExpireOneTimeTokens RPC, which I missed b/c the tests passed.

Most of the core jobs use the timetable block specifically because they don't submit the deletions via raft, but via the RPC. While the leader creates the GC eval, it can be processed on any worker (ref worker.go#L565-L569).

So when I wrote the expire OTT job originally, I hit the same "quiet clusters will have wonky timestamps" problem @jrasell discovered (and I wanted to make transaction logic simple). So I wanted to use the wall clock on the leader to get a more accurate expiration. But I messed that up by not having the timestamp set in the leader's RPC, but in the FSM instead. So we should be setting that on the leader's handler for the RPC and we'll be all set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9d8138c

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backport/1.1.x backport to 1.1.x release line backport/1.2.x backport to 1.1.x release line backport/1.3.x backport to 1.3.x release line theme/auth type/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants