-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DBNode] - Make repairs actually repair data #1849
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1849 +/- ##
=========================================
- Coverage 72.2% 64.2% -8%
=========================================
Files 987 725 -262
Lines 83823 66579 -17244
=========================================
- Hits 60559 42779 -17780
- Misses 19236 20582 +1346
+ Partials 4028 3218 -810
Continue to review full report at Codecov.
|
01eeea9
to
4f85bdd
Compare
d020423
to
d33b0bc
Compare
blockStates BootstrappedBlockStateSnapshot, | ||
) { | ||
for _, block := range bootstrappedBlocks.AllBlocks() { | ||
for _, block := range blocksToLoad.AllBlocks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned offline, this is fine now, but in the future we should measure the performance of cold flushes when there is no cold data. The ColdFlushEnabled
flag should probably just control whether cold writes are accepted or not - other functions like bootstrapping and flushes should be free to use the cold flush mechanism regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, maybe lets add it as a cold writes follow up issue or track it as part of plan to combine warm flush and cold flush into one process?
@@ -310,9 +310,26 @@ func (enc *encoder) LastEncoded() (ts.Datapoint, error) { | |||
return result, nil | |||
} | |||
|
|||
// Len returns the length of the data stream. | |||
// Len returns the length of the final data stream that would be generated | |||
// by a call to Stream(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is slightly redefining what Len()
is returning, right? Are all other uses of this function okay with this new definition? Can you also change the comment for this function in its interface (in dbnode/encoding/types.go
) if appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this isn't used for much and all I've done is make it more accurate. Updated the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
echo "Wait for the data to become available (via repairs) from dbnode03" | ||
ATTEMPTS=10 MAX_TIMEOUT=4 TIMEOUT=1 retry_with_backoff \ | ||
read_all "coldWritesRepairAndNoIndex" "foo" 1 9022 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be port 9032 for the third DB node?
What this PR does / why we need it:
This P.R changes the background repairs feature from simply emitting metrics/logs about data mismatches between node to actually repairing the mismatches.
It also re-writes the repair scheduling logic to a "repairing all the time" model that is more congruent with the fact that M3DB now supports out of order writes.
Includes a combination of unit tests, several integration tests, and a docker integration test.
Special notes for your reviewer:
Does this PR introduce a user-facing and/or backwards incompatible change?:
No.
Does this PR require updating code package or user-facing documentation?:
No.