-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(inverted_index): inverted index cache #4309
Conversation
Update dependencies and add caching for inverted index reader - Updated `atomic` to 0.6.0 and `uuid` to 1.9.1 in `Cargo.lock`. - Added `moka` and `uuid` dependencies in `Cargo.toml`. - Introduced `seek_read` method in `InvertedIndexBlobReader` for common seek and read operations. - Added `cache.rs` module to implement caching for inverted index reader using `moka`. - Updated `async-compression` to 0.4.11 in `puffin/Cargo.toml`.
Refactor InvertedIndexReader and Add Index Cache Support - Refactored `InvertedIndexReader` to include `seek_read` method and default implementations for `fst` and `bitmap`. - Implemented `seek_read` in `InvertedIndexBlobReader` and `CachedInvertedIndexBlobReader`. - Introduced `InvertedIndexCache` in `CacheManager` and `SstIndexApplier`. - Updated `SstIndexApplierBuilder` to accept and utilize `InvertedIndexCache`. - Added `From<FileId> for Uuid` implementation.
Update Cargo.toml and refactor SstIndexApplier - Moved `uuid.workspace` entry in Cargo.toml for better organization.
Important Review skippedReview was skipped due to path filters Files ignored due to path filters (1)
You can disable this status message by setting the WalkthroughThe updates include the addition and enhancement of methods to the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Refactor InvertedIndexCache to use type alias for Arc - Replaced `Arc<InvertedIndexCache>` with `InvertedIndexCacheRef` type alias.
Add Prometheus metrics and caching improvements for inverted index - Introduced `prometheus` and `puffin` dependencies for metrics.
Refactor InvertedIndexReader and Cache handling - Simplified `InvertedIndexReader` trait by removing seek-related comments.
Add configurable cache sizes for inverted index metadata and content - Introduced `index_metadata_size` and `index_content_size` in `CacheManagerBuilder`.
Refactor and optimize inverted index caching - Removed `metrics.rs` and integrated cache metrics into `index.rs`.
Remove unused dependencies from Cargo.lock and Cargo.toml - Removed `moka`, `prometheus`, and `puffin` dependencies from both Cargo.lock and Cargo.toml.
5abcaef
to
611109a
Compare
Do we have any benchmark program and results for this PR? @v0y4g3r |
I updated the PR description. This PR improves those queries which are already fast, but not fast enough, namely those 5~6ms queries. An improvement of 2 to 3ms means cutting the total cost time by half. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
Files selected for processing (19)
- src/index/Cargo.toml (1 hunks)
- src/index/src/inverted_index/format/reader.rs (1 hunks)
- src/index/src/inverted_index/format/reader/blob.rs (2 hunks)
- src/index/src/inverted_index/search/index_apply/predicates_apply.rs (4 hunks)
- src/mito2/src/cache.rs (7 hunks)
- src/mito2/src/cache/index.rs (1 hunks)
- src/mito2/src/config.rs (2 hunks)
- src/mito2/src/read/scan_region.rs (1 hunks)
- src/mito2/src/sst/file.rs (1 hunks)
- src/mito2/src/sst/index/inverted_index/applier.rs (7 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder.rs (6 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/between.rs (5 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/comparison.rs (4 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/eq_list.rs (7 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/in_list.rs (5 hunks)
- src/mito2/src/sst/index/inverted_index/applier/builder/regex_match.rs (4 hunks)
- src/mito2/src/sst/index/inverted_index/creator.rs (2 hunks)
- src/mito2/src/worker.rs (1 hunks)
- tests-integration/tests/http.rs (1 hunks)
Files skipped from review due to trivial changes (5)
- src/index/Cargo.toml
- src/mito2/src/sst/index/inverted_index/applier/builder/between.rs
- src/mito2/src/sst/index/inverted_index/applier/builder/comparison.rs
- src/mito2/src/sst/index/inverted_index/applier/builder/eq_list.rs
- src/mito2/src/sst/index/inverted_index/applier/builder/in_list.rs
Additional comments not posted (46)
src/index/src/inverted_index/format/reader.rs (5)
34-34
: LGTM!The
read_all
method is correctly implemented, reading all data into the destination vector and returning the size of the data read.
37-37
: LGTM!The
seek_read
method is correctly implemented, seeking to the given offset and reading the specified size of data.
40-40
: LGTM!The
metadata
method correctly retrieves and returns the metadata wrapped in anArc
for shared ownership.
43-46
: LGTM!The
fst
method correctly retrieves the FST map by using theseek_read
method and creating anFstMap
from the retrieved data.
49-51
: LGTM!The
bitmap
method correctly retrieves the bitmap by using theseek_read
method and creating aBitVec
from the retrieved data.src/mito2/src/sst/index/inverted_index/applier/builder/regex_match.rs (5)
Line range hint
10-31
:
LGTM!The
collect_regex_match
method correctly collects a regex match expression, performing necessary checks and conditions to ensure validity.
Line range hint
61-80
:
LGTM!The
test_regex_match_basic
test correctly verifies the basic functionality of thecollect_regex_match
method.
Line range hint
95-104
:
LGTM!The
test_regex_match_field_column
test correctly verifies that thecollect_regex_match
method does not add a regex match predicate for field columns.
Line range hint
117-126
:
LGTM!The
test_regex_match_type_mismatch
test correctly verifies that thecollect_regex_match
method does not add a regex match predicate when there is a type mismatch.
Line range hint
139-143
:
LGTM!The
test_regex_match_type_nonexist_column
test correctly verifies that thecollect_regex_match
method returns an error when the column does not exist.src/mito2/src/cache/index.rs (12)
41-49
: LGTM!The
CachedInvertedIndexBlobReader
constructor correctly initializes the struct with the provided file ID, inner reader, and cache.
57-83
: LGTM!The
get_or_load
method correctly retrieves index data from the cache or loads it from the source if not already cached, handling errors and edge cases appropriately.
89-94
: LGTM!The
read_all
method correctly delegates the read operation to the inner reader and returns the size of the data read.
96-102
: LGTM!The
seek_read
method correctly delegates the seek and read operations to the inner reader and returns the data read.
104-113
: LGTM!The
metadata
method correctly retrieves metadata from the cache or loads it from the source if not already cached, handling errors and edge cases appropriately.
116-124
: LGTM!The
fst
method correctly retrieves the FST map by using theget_or_load
method and creating anFstMap
from the retrieved data.
149-177
: LGTM!The
InvertedIndexCache
constructor correctly initializes the cache with the provided metadata and content capacities, setting up eviction listeners to update metrics.
181-183
: LGTM!The
get_index_metadata
method correctly retrieves metadata from the cache for the specified file ID.
185-191
: LGTM!The
put_index_metadata
method correctly inserts metadata into the cache for the specified file ID and updates the metrics.
194-196
: LGTM!The
get_index
method correctly retrieves index data from the cache for the specified key.
198-203
: LGTM!The
put_index
method correctly inserts index data into the cache for the specified key and updates the metrics.
207-214
: LGTM!The
index_metadata_weight
andindex_content_weight
methods correctly calculate the weight of the metadata and content for caching purposes.src/index/src/inverted_index/format/reader/blob.rs (2)
53-58
: LGTM!The
read_all
method correctly seeks to the start and reads all data into the destination vector, handling errors appropriately.
61-69
: LGTM!The
seek_read
method correctly seeks to the given offset and reads the specified size of data into a buffer, handling errors appropriately.src/mito2/src/sst/file.rs (1)
68-72
: LGTM!The implementation of the
From
trait forFileId
toUuid
conversion is correct.src/mito2/src/sst/index/inverted_index/applier.rs (4)
59-61
: LGTM! Ensure proper utilization of the cache.The addition of the
inverted_index_cache
field and its inclusion in thenew
method is correct.Ensure that the cache is properly utilized in the rest of the codebase.
Also applies to: 73-86
108-125
: LGTM! Verify the correct usage ofCachedInvertedIndexBlobReader
.The changes to use
CachedInvertedIndexBlobReader
ifinverted_index_cache
is available are correct.Ensure that
CachedInvertedIndexBlobReader
is correctly implemented and used.
220-220
: LGTM!The addition of the
index_cache
parameter in thetest_index_applier_apply_basic
function is correct.
262-262
: LGTM!The addition of the
index_cache
parameter in thetest_index_applier_apply_invalid_blob_type
function is correct.src/mito2/src/sst/index/inverted_index/applier/builder.rs (2)
66-68
: LGTM! Ensure proper utilization of the cache.The addition of the
index_cache
field and its inclusion in thenew
method is correct.Ensure that the cache is properly utilized in the rest of the codebase.
Also applies to: 77-89
331-331
: LGTM!The addition of the
index_cache
parameter in thetest_collect_and_basic
function is correct.src/index/src/inverted_index/search/index_apply/predicates_apply.rs (2)
Line range hint
166-180
:
LGTM!The update to return
Arc<InvertedIndexMetas>
in themock_metas
function is correct.
305-309
: LGTM!The update to use
Arc<InvertedIndexMetas>
in thetest_index_applier_with_empty_index
function is correct.src/mito2/src/cache.rs (4)
20-20
: LGTM! Addition ofindex
module.The addition of the
index
module aligns with the purpose of introducing caching mechanisms for inverted index.
64-65
: LGTM! Addition ofindex_cache
field inCacheManager
.The addition of the
index_cache
field is appropriate for managing the cache for inverted index.
175-177
: LGTM! Addition ofindex_cache
method inCacheManager
.The method provides access to the
index_cache
field, which is essential for managing the cache for inverted index.
216-226
: LGTM! Addition ofindex_metadata_size
andindex_content_size
methods inCacheManagerBuilder
.These methods allow configuring the cache sizes for index metadata and content, which is crucial for managing the cache effectively.
src/mito2/src/config.rs (2)
87-90
: LGTM! Addition ofinverted_index_metadata_cache_size
andinverted_index_cache_size
fields inMitoConfig
.These fields are essential for configuring the cache sizes for inverted index metadata and content, and their default values are set appropriately.
140-141
: LGTM! Default values forinverted_index_metadata_cache_size
andinverted_index_cache_size
inMitoConfig::default
.The default values of 32MB for both fields are reasonable and align with typical cache size configurations.
src/mito2/src/sst/index/inverted_index/creator.rs (1)
418-423
: LGTM! Usage ofInvertedIndexCache
inSstIndexApplierBuilder
.The integration of the
InvertedIndexCache
in theSstIndexApplierBuilder
is crucial for leveraging the caching mechanisms introduced in this PR.src/mito2/src/read/scan_region.rs (2)
333-338
: LGTM!The
index_cache
variable is correctly initialized by safely accessing and cloning the value from thecache_manager
.
343-343
: LGTM!The
index_cache
variable is correctly passed to theSstIndexApplierBuilder::new
method, enabling caching functionality.src/mito2/src/worker.rs (2)
161-161
: Good addition: Index metadata size configuration.The addition of
index_metadata_size
configuration enhances the cache management for index metadata.
162-162
: Good addition: Index content size configuration.The addition of
index_content_size
configuration enhances the cache management for index content.tests-integration/tests/http.rs (2)
823-823
: Good addition: Inverted index metadata cache size configuration.The addition of
inverted_index_metadata_cache_size
configuration in the config API test ensures the new feature is tested.
824-824
: Good addition: Inverted index cache size configuration.The addition of
inverted_index_cache_size
configuration in the config API test ensures the new feature is tested.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4309 +/- ##
==========================================
- Coverage 85.11% 84.85% -0.27%
==========================================
Files 1060 1061 +1
Lines 187648 187832 +184
==========================================
- Hits 159720 159387 -333
- Misses 27928 28445 +517 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most LGTM
Replace Uuid with FileId in CachedInvertedIndexBlobReader - Updated `file_id` type from `Uuid` to `FileId` in `CachedInvertedIndexBlobReader` and related methods.
Refactor cache configuration for inverted index - Moved `inverted_index_metadata_cache_size` and `inverted_index_cache_size` from `MitoConfig` to `InvertedIndexConfig`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- src/mito2/src/cache/index.rs (1 hunks)
- src/mito2/src/config.rs (2 hunks)
- src/mito2/src/sst/file.rs (1 hunks)
- src/mito2/src/worker.rs (1 hunks)
- tests-integration/tests/http.rs (1 hunks)
Files skipped from review as they are similar to previous changes (5)
- src/mito2/src/cache/index.rs
- src/mito2/src/config.rs
- src/mito2/src/sst/file.rs
- src/mito2/src/worker.rs
- tests-integration/tests/http.rs
Remove unnecessary conversion of `file_id` in `SstIndexApplier` - Simplified the initialization of `CachedInvertedIndexBlobReader` by removing the redundant `into()` conversion for `file_id`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/mito2/src/sst/index/inverted_index/applier.rs (7 hunks)
Additional comments not posted (5)
src/mito2/src/sst/index/inverted_index/applier.rs (5)
60-61
: Addition ofinverted_index_cache
field is appropriate.The
inverted_index_cache
field is correctly added to theSstIndexApplier
struct to hold an optional reference to an inverted index cache.
Line range hint
73-86
: Changes to thenew
method are appropriate.The
new
method correctly accepts an additional parameterindex_cache
and assigns it to theinverted_index_cache
field.
109-125
: Changes to theapply
method are appropriate.The
apply
method correctly includes logic to useCachedInvertedIndexBlobReader
ifinverted_index_cache
is available, enhancing the efficiency of the index application process.
220-220
: Changes to thetest_index_applier_apply_basic
test are appropriate.The test correctly passes
None
for theindex_cache
parameter to thenew
method ofSstIndexApplier
, aligning with the updated method signature.
262-262
: Changes to thetest_index_applier_apply_invalid_blob_type
test are appropriate.The test correctly passes
None
for theindex_cache
parameter to thenew
method ofSstIndexApplier
, aligning with the updated method signature.
metadata_cache_size = "32MiB" | ||
content_cache_size = "32MiB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to make their default value auto
.
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
This PR adds im-memory cache for inverted index that boost most queries by reducing the pruning cost, which reduces total cost by 2~3ms:
Future work
PuffinReader
has been refactored recently and I need to find a proper way to passInvertedIndexCache
toBoundedStager
.Checklist
Summary by CodeRabbit
New Features
Improvements
SstIndexApplier
to conditionally use the cache for index reading.Arc
.Configuration Updates
Testing Enhancements