-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lin reads timeouts and AssignUid recursion in Zero #3203
Conversation
… forwards back to first and so on...
… processing the request much more smoothly because we are done as soon as we find the first activeRctx come back, instead of flooding Raft with unique requests which created a traffic jam.
… if nothing has been sent for a while.
@@ -217,6 +223,7 @@ func (w *RaftServer) RaftMessage(server pb.Raft_RaftMessageServer) error { | |||
} | |||
data := batch.Payload.Data | |||
|
|||
ctx, cancel := context.WithTimeout(ctx, time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cancel function is not used on all paths (possible context leak) (from govet
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 6 files at r1, 5 of 5 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @golangcibot and @manishrjain)
conn/raft_server.go, line 226 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
the cancel function is not used on all paths (possible context leak) (from
govet
)
Can you check if this warning is actionable?
dgraph/cmd/zero/assign.go, line 187 at r2 (raw file):
Forward
Maybe "Forwarded" is a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @golangcibot and @manishrjain)
conn/raft_server.go, line 226 at r1 (raw file):
Previously, golangcibot (Bot from GolangCI) wrote…
the cancel function is not used on all paths (possible context leak) (from
govet
)
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 4 of 7 files reviewed, 2 unresolved discussions (waiting on @golangcibot and @martinmr)
dgraph/cmd/zero/assign.go, line 187 at r2 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Forward
Maybe "Forwarded" is a better name?
Done.
- The way we were doing linerizable reads was creating a new request context for every try. If the request times out, a new one was being used. When things got slower, this caused a traffic jam of new requests compounding because earlier ones timed out and retried. - This PR fixes that by reusing the same `requestCtx`, so on a retry it would still be OK to receive response from the previous tries -- this allows wrapper requests to be considered done and things to move forward. - AssignUids was causing a recursion where a Zero follower (A) forwarded the request to who it thought was the leader (B). But, if B was not (or no longer) the leader, it would try to forward it as well -- maybe to A or others. This caused a lot of recursive requests. This PR fixes that so a forwarded request is not forwarded any further, but outright rejected if received by a Zero follower. Changes: * Avoid a recursive AssignUids call where one Zero forwards to another, forwards back to first and so on. * Reuse the activeRctx when asking for ReadIndex. This allows delays in processing the request much more smoothly because we are done as soon as we find the first activeRctx come back, instead of flooding Raft with unique requests which created a traffic jam. * Move requestCtx to verbosity 3. Add a way to close RaftMessage stream if nothing has been sent for a while. * Add a timeout so raft.Step does not block indefinitely. * Add raft.Ready warning in Alpha as well. Make raft.Step error obvious. * Add warnings in both Zero and Alpha about slow disk. * No need to do select case on pushing to readStateCh. It is typically empty.
- The way we were doing linerizable reads was creating a new request context for every try. If the request times out, a new one was being used. When things got slower, this caused a traffic jam of new requests compounding because earlier ones timed out and retried. - This PR fixes that by reusing the same `requestCtx`, so on a retry it would still be OK to receive response from the previous tries -- this allows wrapper requests to be considered done and things to move forward. - AssignUids was causing a recursion where a Zero follower (A) forwarded the request to who it thought was the leader (B). But, if B was not (or no longer) the leader, it would try to forward it as well -- maybe to A or others. This caused a lot of recursive requests. This PR fixes that so a forwarded request is not forwarded any further, but outright rejected if received by a Zero follower. Changes: * Avoid a recursive AssignUids call where one Zero forwards to another, forwards back to first and so on. * Reuse the activeRctx when asking for ReadIndex. This allows delays in processing the request much more smoothly because we are done as soon as we find the first activeRctx come back, instead of flooding Raft with unique requests which created a traffic jam. * Move requestCtx to verbosity 3. Add a way to close RaftMessage stream if nothing has been sent for a while. * Add a timeout so raft.Step does not block indefinitely. * Add raft.Ready warning in Alpha as well. Make raft.Step error obvious. * Add warnings in both Zero and Alpha about slow disk. * No need to do select case on pushing to readStateCh. It is typically empty.
The way we were doing linerizable reads was creating a new request context for every try. If the request times out, a new one was being used. When things got slower, this caused a traffic jam of new requests compounding because earlier ones timed out and retried.
This PR fixes that by reusing the same
requestCtx
, so on a retry it would still be OK to receive response from the previous tries -- this allows wrapper requests to be considered done and things to move forward.AssignUids was causing a recursion where a Zero follower (A) forwarded the request to who it thought was the leader (B). But, if B was not (or no longer) the leader, it would try to forward it as well -- maybe to A or others. This caused a lot of recursive requests. This PR fixes that so a forwarded request is not forwarded any further, but outright rejected if received by a Zero follower.
This change is