Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blob source to retrieve blobs in RocksDB #10198

Closed
wants to merge 11 commits into from

Conversation

gangliao
Copy link
Contributor

Summary:

There is currently no caching mechanism for blobs, which is not ideal especially when the database resides on remote storage (where we cannot rely on the OS page cache). As part of this task, we would like to make it possible for the application to configure a blob cache.

Tasks:

In this task, we formally introduced the blob source to RocksDB. BlobSource is a new abstraction layer that provides universal access to blobs, regardless of whether they are in the blob cache, secondary cache, or (remote) storage. Depending on user settings, it always fetch blobs from multi-tier cache and storage with minimal cost.

Note: The new MultiGetBlob() implementation is not included in the current PR. To go faster, we aim to create a separate PR for it in parallel!

This PR is a part of #10156

Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @gangliao, it looks great 👍

Could you please add some functional/end-to-end tests to db_blob_basic_test to check blobs can be served from the cache in Get and iterator?

db/column_family.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@gangliao has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@gangliao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

db/version_set.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@ltamasi ltamasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @gangliao ! Looks good, just some minor comments (pls see below)

db/blob/db_blob_basic_test.cc Outdated Show resolved Hide resolved
db/blob/db_blob_basic_test.cc Outdated Show resolved Hide resolved
db/blob/db_blob_basic_test.cc Outdated Show resolved Hide resolved
db/blob/db_blob_basic_test.cc Show resolved Hide resolved
db/blob/db_blob_basic_test.cc Outdated Show resolved Hide resolved
db/column_family.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@gangliao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gangliao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gangliao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ltamasi
Copy link
Contributor

ltamasi commented Jun 21, 2022

Please also look into the "status checked" test failure. Otherwise, LGTM!

@facebook-github-bot
Copy link
Contributor

@gangliao has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

column_family_set_.reset(new ColumnFamilySet(
dbname_, db_options_, file_options_, table_cache_, wbm, wc,
block_cache_tracer_, io_tracer_, db_session_id_));
block_cache_tracer_, io_tracer_, db_id_, db_session_id_));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gangliao I've just realized that this call here creates a ColumnFamilySet whose db_id_ refers to VersionSet::db_id_ as opposed to DBImpl::db_id_. This might work but would justify a closer look / cleanup if applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw that. Can I replace VersionSet::db_id_'s type with const std::string&. When Reset(), we pass it into ColumnFamilySet. And we remove db_id_.clear(); on line 4165.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me that the db id currently takes a somewhat labyrinthine path from the MANIFEST to DBImpl and back to VersionSet? Please confirm; if yes, then using a reference there as well might work.

P.S. Apparently there is already a TODO to clean this up: https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl.h#L1243-L1244

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added this to my to-do list.

facebook-github-bot pushed a commit that referenced this pull request Jul 9, 2022
Summary:
Update HISTORY.md for blob cache.  Implementation can be found from Github issue #10156 (or Github PRs #10155, #10178, #10225, #10198, and #10272).

Pull Request resolved: #10328

Reviewed By: riversand963

Differential Revision: D37732514

Pulled By: gangliao

fbshipit-source-id: 4c942a41c07914bfc8db56a0d3cf4d3e53d5963f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants