Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dbnode] Fix index flush recovery when previous flush attempts have failed #1574

Merged

Conversation

robskillington
Copy link
Collaborator

@robskillington robskillington commented Apr 19, 2019

Prior to this change if an index flush failed to complete and write out a checkpoint file, it would leave dirty files that would never be able to be overwritten.

This changes the behavior to only check that the checkpoint file does not exist and is valid, rather than check if a segment file exists or not before writing it.

It also updates all call sites to correctly always validate a checkpoint file rather than just check for existence in other parts of the persist/fs package.

@codecov
Copy link

codecov bot commented Apr 19, 2019

Codecov Report

Merging #1574 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #1574   +/-   ##
======================================
  Coverage    71.8%   71.8%           
======================================
  Files         946     946           
  Lines       78002   78002           
======================================
  Hits        56015   56015           
  Misses      18343   18343           
  Partials     3644    3644
Flag Coverage Δ
#aggregator 82.3% <0%> (ø) ⬆️
#cluster 85.7% <0%> (ø) ⬆️
#collector 63.7% <0%> (ø) ⬆️
#dbnode 80% <0%> (ø) ⬆️
#m3em 73.2% <0%> (ø) ⬆️
#m3ninx 73.9% <0%> (ø) ⬆️
#m3nsch 51.1% <0%> (ø) ⬆️
#metrics 17.6% <0%> (ø) ⬆️
#msg 74.7% <0%> (ø) ⬆️
#query 66.5% <0%> (ø) ⬆️
#x 86.6% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c24a48c...7c73166. Read the comment docs.

@m3db m3db deleted a comment from codecov bot Apr 19, 2019
// fileset files has a checkpoint file.
func (f FileSetFile) HasCheckpointFile() bool {
func (f FileSetFile) HasCompleteCheckpointFile() bool {
for _, fileName := range f.AbsoluteFilepaths {
if strings.Contains(fileName, checkpointFileSuffix) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need this line anymore since CompleteCheckpointFileExists does this already.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True but that will cause an "Invariant error" to be allocated which I'd like to avoid in the case that we're scanning a lot of files.

Rob Skillington and others added 3 commits April 19, 2019 17:14
@robskillington robskillington changed the title [dbnode] Fix index flush not able to recover when writing over half-persisted volumes [dbnode] Fix index flush recovery when previous flush attempts have failed Apr 19, 2019
Rob Skillington added 2 commits April 19, 2019 17:32
… of github.com:m3db/m3 into r/fix-index-flush-not-overwriting-half-persisted-files
@robskillington robskillington added the area:db All issues pertaining to dbnode label Apr 19, 2019
Copy link
Collaborator

@justinjc justinjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits

type LazyEvalBool uint8

const (
// EvalNone indicated the boolean has not been evaluated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicates

@@ -73,14 +73,27 @@ var (

type fileOpener func(filePath string) (*os.File, error)

// LazyEvalBool is a boolean is lazily evaluated.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean that is

@robskillington robskillington merged commit 7dad7ee into master Apr 19, 2019
@robskillington robskillington deleted the r/fix-index-flush-not-overwriting-half-persisted-files branch April 19, 2019 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:db All issues pertaining to dbnode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants