-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support an improved/global limit on BlobDB's space amp #10399
base: main
Are you sure you want to change the base?
Conversation
Summary: BlobDB currently supports limiting space amplification via the configuration option `blob_garbage_collection_force_threshold` (https://github.com/facebook/rocksdb/blob/main/include/rocksdb/advanced_options.h#L958-L969). It works by computing the ratio of garbage (i.e. garbage bytes divided by total bytes) over the oldest batch of blob files, and if the ratio exceeds the specified threshold, it triggers a special type of compaction targeting the SST files that point to the blob files in question. (There is a coarse mapping between SSTs and blob files, which we track in the MANIFEST.) This existing option can be difficult to use or tune. There are (at least) two challenges: - The occupancy of blob files is not uniform: older blob files tend to have more garbage, so if a service owner has a specific space amp goal, it is far from obvious what value they should set for `blob_garbage_collection_force_threshold`. - BlobDB keeps track of the exact amount of garbage in blob files, which enables us to compute the blob files' "space amp" precisely. Even though it's an exact value, there is a disconnect between this metric and people's expectations regarding space amp. The problem is that while people tend to think of LSM tree space amp as the ratio between the total size of the DB and the total size of the live/current KVs, for the purposes of blob space amp, a blob is only considered garbage once the corresponding blob reference has already been compacted out from the LSM tree. (One could say the the LSM tree space amp notion described above is "logical", while the blob one is "physical".) To make the users' lives easier and solve facebook#1, we would want to add a new configuration option (working title: `blob_garbage_collection_space_amp_limit`) that would enable customers to directly set a space amp target (as opposed to a per-blob-file-batch garbage threshold). To bridge the gap between the above notion of LSM tree space amp and the blob space amp (facebook#2), we would want this limit to apply to the entire data structure/database (the LSM tree plus the blob files). Note that this will necessarily be an estimate, since we don't know exactly how much space the obsolete KVs take up in the LSM tree. One simple idea would be to take the reciprocal of the LSM tree space amp estimated using the method of `VersionStorageInfo::EstimateLiveDataSize`, and scale the number of live blob bytes using the same factor. Example: let's say the LSM tree space amp is 1.5, which means that the live KVs take up two thirds of the LSM. Then, we can use the same 2/3 factor to multiply the value of (total blob bytes - garbage blob bytes) to get an estimate of the live blob bytes from the user's perspective. Note: if the above limit is breached, we would still want to do the same thing as in the case of `blob_garbage_collection_force_threshold`, i.e. force-compact the SSTs pointing to the oldest blob files (potentially repeatedly, until the limit is satisified).
@@ -4,6 +4,7 @@ | |||
* Added `prepopulate_blob_cache` to ColumnFamilyOptions. If enabled, prepopulate warm/hot blobs which are already in memory into blob cache at the time of flush. On a flush, the blob that is in memory (in memtables) get flushed to the device. If using Direct IO, additional IO is incurred to read this blob back into memory again, which is avoided by enabling this option. This further helps if the workload exhibits high temporal locality, where most of the reads go to recently written data. This also helps in case of the remote file system since it involves network traffic and higher latencies. | |||
* Support using secondary cache with the blob cache. When creating a blob cache, the user can set a secondary blob cache by configuring `secondary_cache` in LRUCacheOptions. | |||
* Charge memory usage of blob cache when the backing cache of the blob cache and the block cache are different. If an operation reserving memory for blob cache exceeds the avaible space left in the block cache at some point (i.e, causing a cache full under `LRUCacheOptions::strict_capacity_limit` = true), creation will fail with `Status::MemoryLimit()`. To opt in this feature, enable charging `CacheEntryRole::kBlobCache` in `BlockBasedTableOptions::cache_usage_options`. | |||
* Added a new blob garbage collection option `blob_garbage_collection_space_amp_limit` to enable customers to directly set a space amplification target (as opposed to a per-blob-file-batch garbage threshold) to support an improved/global limit on BlobDB's space amplification.`blob_garbage_collection_space_amp_limit` is set to 0.0 (disabled) by default. To enable this feature, set `blob_garbage_collection_space_amp_limit` to a positive value between 1.0 and 50.0. The lower the value, the more aggressive the garbage collection. This option is only available when `blob_garbage_collection` is enabled, and it will replace the option `blob_garbage_collection_force_threshold` if it is set properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noe sure about the most suitable upper limit
Hi @gangliao! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
looking forward to this pr |
no more plans to solve the CLA checks? |
@riversand963 @akankshamahajan15 @ltamasi looking forward to merge this commit |
Summary:
BlobDB currently supports limiting space amplification via the configuration option
blob_garbage_collection_force_threshold
. It works by computing the ratio of garbage (i.e. garbage bytes divided by total bytes) over the oldest batch of blob files, and if the ratio exceeds the specified threshold, it triggers a special type of compaction targeting the SST files that point to the blob files in question. (There is a coarse mapping between SSTs and blob files, which we track in the MANIFEST.)This existing option can be difficult to use or tune. There are (at least) two challenges:
(1). The occupancy of blob files is not uniform: older blob files tend to have more garbage, so if a service owner has a specific space amp goal, it is far from obvious what value they should set for
blob_garbage_collection_force_threshold
.(2). BlobDB keeps track of the exact amount of garbage in blob files, which enables us to compute the blob files' "space amp" precisely. Even though it's an exact value, there is a disconnect between this metric and people's expectations regarding space amp. The problem is that while people tend to think of LSM tree space amp as the ratio between the total size of the DB and the total size of the live/current KVs, for the purposes of blob space amp, a blob is only considered garbage once the corresponding blob reference has already been compacted out from the LSM tree. (One could say the the LSM tree space amp notion described above is "logical", while the blob one is "physical".)
To make the users' lives easier and solve (1), we would want to add a new configuration option (working title:
blob_garbage_collection_space_amp_limit
) that would enable customers to directly set a space amp target (as opposed to a per-blob-file-batch garbage threshold). To bridge the gap between the above notion of LSM tree space amp and the blob space amp (2), we would want this limit to apply to the entire data structure/database (the LSM tree plus the blob files). Note that this will necessarily be an estimate, since we don't know exactly how much space the obsolete KVs take up in the LSM tree. One simple idea would be to take the reciprocal of the LSM tree space amp estimated using the method ofVersionStorageInfo::EstimateLiveDataSize
, and scale the number of live blob bytes using the same factor.Example: let's say the LSM tree space amp is 1.5, which means that the live KVs take up two thirds of the LSM. Then, we can use the same 2/3 factor to multiply the value of (total blob bytes - garbage blob bytes) to get an estimate of the live blob bytes from the user's perspective.
Note: if the above limit is breached, we would still want to do the same thing as in the case of
blob_garbage_collection_force_threshold
, i.e. force-compact the SSTs pointing to the oldest blob files (potentially repeatedly, until the limit is satisified).blob_garbage_collection_space_amp_limit
This task is a part of #10156