Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add separate cache for bloomfilters #1260

Merged
merged 5 commits into from
Mar 16, 2020
Merged

Conversation

jarifibrahim
Copy link
Contributor

@jarifibrahim jarifibrahim commented Mar 16, 2020

#1256 showed that a single cache might not be enough to store the
data blocks and the bloom filters.
This commit adds a separate cache for the bloom filters. This
commit also adds a new flag LoadBloomOnOpen which determines
if the bloom filters should be loaded when the table is opened on
or not.

The default value of MaxBfCacheSize is zero and
LoadBloomOnOpen is true.

This change has significant performance improvement on read speeds
because a single cache would lead to bloom filter eviction and we
would read the bloomfilter from the disk.


This change is Reviewable

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Approved.

Reviewed 5 of 5 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @ashish-goswami and @jarifibrahim)


options.go, line 131 at r1 (raw file):

		MaxCacheSize:            1 << 30, // 1 GB
		MaxBfCacheSize:          0,
		LoadBfLazily:            false,

LoadBloomsOnOpen


table/table.go, line 514 at r1 (raw file):

		if t.bf == nil {
			y.AssertTrue(t.opt.LoadBfLazily)
			t.bf, _ = t.readBloomFilter()

This might be introducing a race condition. Multiple goroutines could be reading and setting t.bf.

Copy link
Contributor Author

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 5 files reviewed, 2 unresolved discussions (waiting on @ashish-goswami and @manishrjain)


options.go, line 131 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

LoadBloomsOnOpen

Done.


table/table.go, line 514 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This might be introducing a race condition. Multiple goroutines could be reading and setting t.bf.

The race condition could be possible but none of our tests picked up this race condition. I'll keep this lock-free for now.

@jarifibrahim jarifibrahim merged commit eaf64c0 into master Mar 16, 2020
@jarifibrahim jarifibrahim deleted the ibrahim/add-bfcache branch March 16, 2020 17:26
jarifibrahim pushed a commit that referenced this pull request Mar 24, 2020
#1256 showed that a single
cache might not be enough to store the data blocks and the bloom
filters.
This commit adds a separate cache for the bloom filters. This
commit also adds a new flag `LoadBloomsOnOpen` which determines
if the bloom filters should be loaded when the table is opened on
or not.

The default value of `MaxBfCacheSize` is `zero` and 
`LoadBloomsOnOpen` is true.

This change has significant performance improvement on read speeds
because a single cache would lead to bloom filter eviction and we
would read the bloom filter from the disk.

(cherry picked from commit eaf64c0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants