-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add blob source to retrieve blobs in RocksDB #10198
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @gangliao, it looks great 👍
Could you please add some functional/end-to-end tests to db_blob_basic_test
to check blobs can be served from the cache in Get
and iterator?
@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@gangliao has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@gangliao has updated the pull request. You must reimport the pull request before landing. |
@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @gangliao ! Looks good, just some minor comments (pls see below)
@gangliao has updated the pull request. You must reimport the pull request before landing. |
@gangliao has updated the pull request. You must reimport the pull request before landing. |
@gangliao has updated the pull request. You must reimport the pull request before landing. |
@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Please also look into the "status checked" test failure. Otherwise, LGTM! |
@gangliao has updated the pull request. You must reimport the pull request before landing. |
@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
column_family_set_.reset(new ColumnFamilySet( | ||
dbname_, db_options_, file_options_, table_cache_, wbm, wc, | ||
block_cache_tracer_, io_tracer_, db_session_id_)); | ||
block_cache_tracer_, io_tracer_, db_id_, db_session_id_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gangliao I've just realized that this call here creates a ColumnFamilySet
whose db_id_
refers to VersionSet::db_id_
as opposed to DBImpl::db_id_
. This might work but would justify a closer look / cleanup if applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw that. Can I replace VersionSet::db_id_
's type with const std::string&
. When Reset()
, we pass it into ColumnFamilySet
. And we remove db_id_.clear();
on line 4165.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me that the db id currently takes a somewhat labyrinthine path from the MANIFEST to DBImpl
and back to VersionSet
? Please confirm; if yes, then using a reference there as well might work.
P.S. Apparently there is already a TODO to clean this up: https://github.com/facebook/rocksdb/blob/main/db/db_impl/db_impl.h#L1243-L1244
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added this to my to-do list.
Summary: Update HISTORY.md for blob cache. Implementation can be found from Github issue #10156 (or Github PRs #10155, #10178, #10225, #10198, and #10272). Pull Request resolved: #10328 Reviewed By: riversand963 Differential Revision: D37732514 Pulled By: gangliao fbshipit-source-id: 4c942a41c07914bfc8db56a0d3cf4d3e53d5963f
Summary:
There is currently no caching mechanism for blobs, which is not ideal especially when the database resides on remote storage (where we cannot rely on the OS page cache). As part of this task, we would like to make it possible for the application to configure a blob cache.
Tasks:
In this task, we formally introduced the blob source to RocksDB. BlobSource is a new abstraction layer that provides universal access to blobs, regardless of whether they are in the blob cache, secondary cache, or (remote) storage. Depending on user settings, it always fetch blobs from multi-tier cache and storage with minimal cost.
Note: The new
MultiGetBlob()
implementation is not included in the current PR. To go faster, we aim to create a separate PR for it in parallel!This PR is a part of #10156