-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storages: support building vector index for ColumnFileTiny (Part 3) #9547
Storages: support building vector index for ColumnFileTiny (Part 3) #9547
Conversation
e52552a
to
c91afcc
Compare
c91afcc
to
13b2e3b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work
dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileSetWithVectorIndexInputStream.cpp
Outdated
Show resolved
Hide resolved
dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileSetWithVectorIndexInputStream.cpp
Outdated
Show resolved
Hide resolved
dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileSetWithVectorIndexInputStream.cpp
Outdated
Show resolved
Hide resolved
std::vector<Key> keys(result.size()); | ||
std::vector<Distance> distances(result.size()); | ||
result.dump_to(keys.data(), distances.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually you can access each result element directly without dumping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep it simple and save memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I mean you can have VectorIndexViewer::SearchResult without double copying, like this maybe:
const size_t result_size = result.size();
std::vector<SearchResult> search_results;
search_results.reserve(result_size);
for (size_t i = 0; i < result_size; ++i) {
const auto rowid = result[i].member.key;
if (valid_rows[rowid])
search_results.emplace_back({ rowid, result[i].distance });
}
In this way we only have one round of copying, instead of two rounds.
dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileTinyVectorIndexReader.cpp
Show resolved
Hide resolved
dbms/src/Storages/DeltaMerge/ColumnFile/ColumnFileTinyVectorIndexReader.cpp
Outdated
Show resolved
Hide resolved
const auto key = fmt::format( | ||
"{}{}_{}", | ||
VectorIndexCache::COLUMNFILETINY_INDEX_NAME_PREFIX, | ||
tiny_file.keyspace_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether store_id, table_id is also needed @JaySon-Huang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think index_page_id
is enough. Each node has it's own vec_index_cache
, and it only caches the local index page.
The rest LGTM |
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
2797265
to
671193f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
Signed-off-by: Lloyd-Pottiger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@CalvinNeo: Your lgtm message is repeated, so it is ignored. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: breezewish, CalvinNeo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
…ingcap#9547) ref pingcap#9032 Signed-off-by: Lloyd-Pottiger <[email protected]>
What problem does this PR solve?
Issue Number: ref #9600
Problem Summary:
What is changed and how it works?
This PR is part of https://github.com/tidbcloud/tiflash-cse/pull/293, which introduces the read path and adds unit tests.
Check List
Tests
Side effects
Documentation
Release note