-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix corruption with intra-L0 on ingested files #5958
Changes from 17 commits
1c28bb7
18c7b1a
45cd5f6
21f132b
0da1373
2b90d27
cf58bd0
950a7a4
caafdb3
d66cd3f
c83af5c
8b306fe
7d30d7a
eaae847
b311481
67bd957
febcccb
9af6cdb
e2ad348
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,32 +39,49 @@ bool FindIntraL0Compaction(const std::vector<FileMetaData*>& level_files, | |
size_t min_files_to_compact, | ||
uint64_t max_compact_bytes_per_del_file, | ||
uint64_t max_compaction_bytes, | ||
CompactionInputFiles* comp_inputs) { | ||
size_t compact_bytes = static_cast<size_t>(level_files[0]->fd.file_size); | ||
uint64_t compensated_compact_bytes = level_files[0]->compensated_file_size; | ||
CompactionInputFiles* comp_inputs, | ||
SequenceNumber earliest_mem_seqno) { | ||
|
||
// Do not pick ingested file when there is at least one memtable not flushed which of seqno is overlap with the sst. | ||
size_t start = 0; | ||
for (; start < level_files.size(); start++) { | ||
if (level_files[start]->being_compacted) { | ||
return false; | ||
} | ||
// If there is no data in memtable, the earliest sequence number would the largest sequence number in last memtable. | ||
if (level_files[start]->fd.largest_seqno <= earliest_mem_seqno) { | ||
break; | ||
} | ||
} | ||
if (start >= level_files.size()) { | ||
return false; | ||
} | ||
size_t compact_bytes = static_cast<size_t>(level_files[start]->fd.file_size); | ||
uint64_t compensated_compact_bytes = level_files[start]->compensated_file_size; | ||
size_t compact_bytes_per_del_file = port::kMaxSizet; | ||
// Compaction range will be [0, span_len). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This comment seems to be out-dated because we are compacting [start, limit) |
||
size_t span_len; | ||
size_t limit; | ||
// Pull in files until the amount of compaction work per deleted file begins | ||
// increasing or maximum total compaction size is reached. | ||
size_t new_compact_bytes_per_del_file = 0; | ||
for (span_len = 1; span_len < level_files.size(); ++span_len) { | ||
compact_bytes += static_cast<size_t>(level_files[span_len]->fd.file_size); | ||
compensated_compact_bytes += level_files[span_len]->compensated_file_size; | ||
new_compact_bytes_per_del_file = compact_bytes / span_len; | ||
if (level_files[span_len]->being_compacted || | ||
for (limit = start + 1; limit < level_files.size(); ++limit) { | ||
compact_bytes += static_cast<size_t>(level_files[limit]->fd.file_size); | ||
compensated_compact_bytes += level_files[limit]->compensated_file_size; | ||
new_compact_bytes_per_del_file = compact_bytes / (limit - start); | ||
if (level_files[limit]->fd.largest_seqno >= earliest_mem_seqno || | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know how I don't think Regarding the
Can we treat the equality case consistently? Should we include the file whose largest_seqno equals earliest_mem_seqno or not? We may already included level_files[start] which satisfies the equality condition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry. I made a mistake. Largest seqno of files in L0 are sorted in descending ordered. So I just need to check the position which start to pick. |
||
level_files[limit]->being_compacted || | ||
new_compact_bytes_per_del_file > compact_bytes_per_del_file || | ||
compensated_compact_bytes > max_compaction_bytes) { | ||
break; | ||
} | ||
compact_bytes_per_del_file = new_compact_bytes_per_del_file; | ||
} | ||
|
||
if (span_len >= min_files_to_compact && | ||
if ((limit - start) >= min_files_to_compact && | ||
compact_bytes_per_del_file < max_compact_bytes_per_del_file) { | ||
assert(comp_inputs != nullptr); | ||
comp_inputs->level = 0; | ||
for (size_t i = 0; i < span_len; ++i) { | ||
for (size_t i = start; i < limit; ++i) { | ||
comp_inputs->files.push_back(level_files[i]); | ||
} | ||
return true; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1486,12 +1486,12 @@ TEST_F(CompactionPickerTest, IntraL0MaxCompactionBytesNotHit) { | |
// All 5 L0 files will be picked for intra L0 compaction. The one L1 file | ||
// spans entire L0 key range and is marked as being compacted to avoid | ||
// L0->L1 compaction. | ||
Add(0, 1U, "100", "150", 200000U); | ||
Add(0, 2U, "151", "200", 200000U); | ||
Add(0, 3U, "201", "250", 200000U); | ||
Add(0, 4U, "251", "300", 200000U); | ||
Add(0, 5U, "301", "350", 200000U); | ||
Add(1, 6U, "100", "350", 200000U); | ||
Add(0, 1U, "100", "150", 200000U, 0, 100, 101); | ||
Add(0, 2U, "151", "200", 200000U, 0, 102, 103); | ||
Add(0, 3U, "201", "250", 200000U, 0, 104, 105); | ||
Add(0, 4U, "251", "300", 200000U, 0, 106, 107); | ||
Add(0, 5U, "301", "350", 200000U, 0, 108, 109); | ||
Add(1, 6U, "100", "350", 200000U, 0, 110, 111); | ||
vstorage_->LevelFiles(1)[0]->being_compacted = true; | ||
UpdateVersionStorageInfo(); | ||
|
||
|
@@ -1516,12 +1516,12 @@ TEST_F(CompactionPickerTest, IntraL0MaxCompactionBytesHit) { | |
// max_compaction_bytes limit (the minimum number of files for triggering | ||
// intra L0 compaction is 4). The one L1 file spans entire L0 key range and | ||
// is marked as being compacted to avoid L0->L1 compaction. | ||
Add(0, 1U, "100", "150", 200000U); | ||
Add(0, 2U, "151", "200", 200000U); | ||
Add(0, 3U, "201", "250", 200000U); | ||
Add(0, 4U, "251", "300", 200000U); | ||
Add(0, 5U, "301", "350", 200000U); | ||
Add(1, 6U, "100", "350", 200000U); | ||
Add(0, 1U, "100", "150", 200000U, 0, 100, 101); | ||
Add(0, 2U, "151", "200", 200000U, 0, 102, 103); | ||
Add(0, 3U, "201", "250", 200000U, 0, 104, 105); | ||
Add(0, 4U, "251", "300", 200000U, 0, 106, 107); | ||
Add(0, 5U, "301", "350", 200000U, 0, 108, 109); | ||
Add(1, 6U, "100", "350", 200000U, 0, 109, 110); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add several tests to cover more scenarios in FindIntraL0Compaction()? For example, when being_compacted shows up in L0? Also, it will be nice to directly cover the earliest_mem_seqno scenarios in tests here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I have add a tests for |
||
vstorage_->LevelFiles(1)[0]->being_compacted = true; | ||
UpdateVersionStorageInfo(); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would work to pick the span of files for intra-L0 starting at the smallest index
i
for whichlevel_files[i]->fd.largest_seqno < oldest_mem_seqno
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nicer phrasing of a suggestion I had in a comment. I don't see anything wrong with this right now, but I'd like to think harder about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I loaded the page before you reviewed and didn't notice the repetition. I haven't thought of any problems with it. I also plan to experiment with relaxing the restriction more generally on intra-L0 picking from the newest file as it might be related to our imports getting stuck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrap the line to follow 80-char rule. If possible, please try to run "make format".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr It seems to me it could still order the output to be newer than
L0_(i-1)
ifL0_i
is an ingested file andL0_i->largest_seqno < oldest_mem_seqno
butL0_i->largest_seqno > L0_(i-1)->largest_seqno
.