-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Opencensus and fix Jepsen bank test #2764
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… just cancel a txn proposal. Need some refactoring of how Zero gets membership updates, now that we need Zero to not forward proposals to the leader.
…oposal forwarding.
dna2github
pushed a commit
to dna2fork/dgraph
that referenced
this pull request
Jul 19, 2019
Got it. Fixed it. Squashed it. Done with it. Tamed it. Time to bask in the glory of victory! Fixed Jepsen bank test violation during a network partition. The violation was happening because: 1. Current Zero leader receives a commit request and assigns a commit timestamp. 2. Gets partitioned out, so the proposal gets blocked. 3. Another Zero becomes the leader and renews the txn timestamp lease, starting at a much higher number (previous lease + lease bandwidth of 10K). 4. The new leader services new txn requests, which advances all Alphas to a much higher MaxApplied timestamp. 5. The previous leader, who is now the follower, retries the old commit proposal and succeeds. This causes 2 issues. a) A later txn read doesn't see this commit, and advances. b) Alpha rejects a write on a posting list key, which already has a new commit at higher ts. Both of the scenarios caused bank txn test violation. Open Census and Dgraph debug disect was instrumental in determining the cause of this violation. The tell-tale sign was noticing a /Commit timeout of one of the penultimate commits, to the violating commit. Fixes: 0. Use OpenCensus as a torch to light every path that a txn took, to determine the cause. 1. Do not allow a context deadline when proposing a txn commit. 2. Do not allow Zero follower to propagate proposals to Zero leader. Tested this PR to fix issue dgraph-io#2391 . Tested with partition-ring and clock-skew nemesis in a 10-node cluster. More testing is needed around predicate moves. Changelog: * Trial: Do not purge anything (needs to be refactored and reintroduced before a release). * More debugging, more tracing for Jepsen. * Opencensus in Zero. * One fix for Jepsen test, which ensures that a context deadline cannot just cancel a txn proposal. Need some refactoring of how Zero gets membership updates, now that we need Zero to not forward proposals to the leader. * Update Raft lib, so we have access to the feature to disallow Raft proposal forwarding. * Update raftpb proto package as well * Dirty changes to ensure that Zero followers don't forward proposals to Zero leader. * Various Opencensus integration changes.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Got it. Fixed it. Squashed it. Done with it. Tamed it. Time to bask in the glory of victory!
This change is