-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PR=quorum in HEADs at GC and PW=1 at writing blocks [RCS-282] #1242
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
No much time for now, sorry for no digging into details. GC seems to be hang up when executing batch for 4-node cluster which has two stopped node. |
Quick diff:
|
b040e56
to
1bf2890
Compare
This commit tries to address following potential issues: * False-notfound in case of faulty cluster that has several nodes down, block server misunderstands that block was already deleted which may lead to block leak (PR default is 0) * Make sure at least single replica of a block is written in one of primary nodes (PW default is 0) * Prevent block server from crash in case heading a block returning other errors than notfound - this makes GC correctly skip unreachable blocks and go to try to collect next blocks and manifests as much as possible The price of this commit: with PR and PW values increased at blocks, the availability of PUTs and throughtput of garbage collection may decrease than previous releases of Riak CS. A few Riak nodes being unreachable may prevent PUT requests from successful return, or garbage collection from collecting all blocks until unreachable nodes come back. We could use PW=2 for putting blocks but we didn't this is because it decreases availability more - and we expect such blocks be repaired during leeway period.
1bf2890
to
2def697
Compare
Much safer (= less possibility of false negativity) than before 🔒 |
borshop
added a commit
that referenced
this pull request
Sep 11, 2015
Add PR=quorum in HEADs at GC and PW=1 at writing blocks [RCS-282] Reviewed-by: shino
@borshop merge |
For release note: Github Issue has full description about this fix. |
For release note: Github Issue has full description about this fix. _[posted via JIRA by Kota Uenishi]_ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commit tries to address following potential issues:
down, block server misunderstands that block was already deleted
which may lead to block leak (PR default is 0)
of primary nodes (PW default is 0)
other errors than notfound - this makes GC correctly skip unreachable
blocks and go to try to collect next blocks and manifests as much
as possible
The price of this commit: with PR and PW values increased at blocks,
the availability of PUTs and throughtput of garbage collection may
decrease than previous releases of Riak CS. A few Riak nodes being
unreachable may prevent PUT requests from successful return, or
garbage collection from collecting all blocks until unreachable nodes
come back.
We could use PW=2 for putting blocks but we didn't this is because
it decreases availability more - and we expect such blocks be repaired
during leeway period.