-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate the copying of blobs when serving reads from the cache #10297
Conversation
Summary: The blob cache enables an optimization on the read path: when a blob is found in the cache, we can avoid copying it into the buffer provided by the application. Instead, we can simply transfer ownership of the cache handle to the target `PinnableSlice`. (Note: this relies on the `Cleanable` interface, which is implemented by `PinnableSlice`.) This has the potential to save a lot of CPU, especially with large blob values.
@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
db/blob/blob_source.cc
Outdated
// application, we can simply transfer ownership of the cache handle to | ||
// the target PinnableSlice. This has the potential to save a lot of | ||
// CPU, especially with large blob values. | ||
blob_entry.TransferTo(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if there's a better way to do this. We could consider adding a PinnableSlice::PinSlice
overload that takes a Slice
and a CachableEntry
; however, this would involve exposing CachableEntry
(which is currently an internal/implementation class) in the public API. So we probably wouldn't want to do that.
What I'm thinking is using the existing PinSlice(const Slice& s, CleanupFunction f, void* arg1, void* arg2)
overload (with ReleaseCacheHandle
as the fn, and the cache and cache handle as args, like below; note that we're guaranteed to be dealing with a cached value here, so there is no need for an "own value" branch), and then make the CachableEntry
relinquish ownership of the handle by calling blob_entry.TransferTo(nullptr);
. What do you guys think? Bit of a hack I suppose (and we would definitely want to explain what's happening in a comment) but we wouldn't have to add a TransferTo
overload to CachableEntry
for just this specific use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, since there is no "own value" case here, we could also consider moving from CachableEntry
to CacheHandleGuard
. We would have to add a TransferTo(Cleanable*)
method to CacheHandleGuard
though b/c it doesn't have one rn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semi-related to this: we could also think about eliminating the copy that happens when we insert a blob into the cache (i.e. the string::assign
that happens in BlobSource::PutBlobIntoCache
). This would most likely involve a change to BlobFileReader
's interface but that shouldn't be a big deal now that we always read blobs via the BlobSource
layer. But this is definitely for later (could be a separate follow-up task)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm thinking is using the existing
PinSlice(const Slice& s, CleanupFunction f, void* arg1, void* arg2)
overload (withReleaseCacheHandle
as the fn, and the cache and cache handle as args, like below;
I've thought about it this way before. But ReleaseCacheHandle
is private. So we need to expose it, right?
Or, we plan to use a lambda function in here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can use a lambda; I think it's cleaner than taking a dependency on CachableEntry
just for the purposes of reusing ReleaseCacheHandle
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. When I first thought about this, the ReleaseCacheHandle
being private was my concern. I am fine with either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also think about eliminating the copy that happens when we insert a blob into the cache (i.e. the string::assign that happens in BlobSource::PutBlobIntoCache). This would most likely involve a change to BlobFileReader's interface but that shouldn't be a big deal now that we always read blobs via the BlobSource layer. But this is definitely for later (could be a separate follow-up task)
If so, who will take ownership? Oh I see. That's really similar to eliminate the copying of blobs when serving reads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: an additional stretch task for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, who will take ownership? Oh I see. That's really similar to eliminate the copying of blobs when serving reads
We could look at e.g. how BlockContents
or UncompressionDict
are used for inspiration. BTW, as part of this additional task, we could also address this TODO: https://github.com/facebook/rocksdb/blob/main/db/blob/blob_source.cc#L68-L69
@gangliao has updated the pull request. You must reimport the pull request before landing. |
@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@gangliao has updated the pull request. You must reimport the pull request before landing. |
@gangliao has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary:
The blob cache enables an optimization on the read path: when a blob is found in the cache, we can avoid copying it into the buffer provided by the application. Instead, we can simply transfer ownership of the cache handle to the target
PinnableSlice
. (Note: this relies on theCleanable
interface, which is implemented byPinnableSlice
.)This has the potential to save a lot of CPU, especially with large blob values.
This task is a part of #10156