Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING] opt(compactions): Improve compaction performance #1574

Merged
merged 24 commits into from
Oct 26, 2020

Conversation

manishrjain
Copy link
Contributor

@manishrjain manishrjain commented Oct 23, 2020

Implement multiple ideas for speeding up compactions:

  1. Dynamic Level Sizes: https://rocksdb.org/blog/2015/07/23/dynamic-level.html
  2. L0 to L0 compactions: https://rocksdb.org/blog/2017/06/26/17-level-based-changes.html
  3. Sub Compactions: Split up one compaction into multiple sub-compactions using key ranges, which can be run concurrently.
  4. If a table being generated at Li overlaps with >= 10 tables at Li+1, finish the table. This helps avoid big overlaps and expensive compactions later.
  5. Update compaction priority based on the priority of the next level prioritizing compactions of lower levels over upper levels, resulting in an always healthy LSM tree structure.

With these changes, we can load 1B entries (160GB of data) into Badger (without the Stream framework) in 1h25m at 31 MB/s. This is a significant improvement over current master.


This change is Reviewable

@manishrjain manishrjain changed the title opt(compactions): Improve compaction performance [BREAKING] opt(compactions): Improve compaction performance Oct 23, 2020
Copy link

@codelingo codelingo bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found. 1 rules errored during the review.

levels.go Outdated
// concurrently, only iterating over the provided key range, generating tables.
// This speeds up the compaction significantly.
func (s *levelsController) subcompact(it y.Iterator, kr keyRange, cd compactDef,
inflightBuilders *y.Throttle, res chan *table.Table) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returned channels or channel arguments should generally have a direction.

View Rule

@@ -66,7 +66,7 @@ var (
vlogMaxEntries uint32
loadBloomsOnOpen bool
detectConflicts bool
compression bool
zstdComp bool
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid global variables to improve readability and reduce complexity

View Rule

@manishrjain manishrjain merged commit 45bca18 into master Oct 26, 2020
@manishrjain manishrjain deleted the mrjn/compactions branch October 26, 2020 21:13
NamanJain8 pushed a commit that referenced this pull request Nov 5, 2020
Implement multiple ideas for speeding up compactions:

1. Dynamic Level Sizes: https://rocksdb.org/blog/2015/07/23/dynamic-level.html
2. L0 to L0 compactions: https://rocksdb.org/blog/2017/06/26/17-level-based-changes.html
3. Sub Compactions: Split up one compaction into multiple sub-compactions using key ranges, which can be run concurrently.
4. If a table being generated at Li overlaps with >= 10 tables at Li+1, finish the table. This helps avoid big overlaps and expensive compactions later.
5. Update compaction priority based on the priority of the next level prioritizing compactions of lower levels over upper levels, resulting in an always healthy LSM tree structure.

With these changes, we can load 1B entries (160GB of data) into Badger (without the Stream framework) in 1h25m at 31 MB/s. This is a significant improvement over current master.

Co-authored-by: Ibrahim Jarif <[email protected]>

fix(tests): Writebatch, Stream, Vlog tests (#1577)

This PR fixes the following issues/tests
 - Deadlock in writes batch - Use atomic to set value of `writebatch.error`
 - Vlog Truncate test - Fix issues with empty memtables
 - Test options - Set memtable size.
 - Compaction tests - Acquire lock before updating level tables
 - Vlog Write - Truncate the file size if the transaction cannot fit in vlog size
 - TestPersistLFDiscardStats - Set numLevelZeroTables=1 to force compaction.

This PR also fixes the failing bank test by adding an index cache to the bank test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant