Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: speed up search iterator stage 1 #37947

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

PwzXxm
Copy link
Contributor

@PwzXxm PwzXxm commented Nov 22, 2024

issue: #37548

@sre-ci-robot sre-ci-robot added area/dependency Pull requests that update a dependency file size/XXL Denotes a PR that changes 1000+ lines. labels Nov 22, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Nov 22, 2024
Copy link
Contributor

mergify bot commented Nov 22, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 87.56757% with 46 lines in your changes missing coverage. Please review.

Project coverage is 80.98%. Comparing base (5394f47) to head (9016c4a).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
internal/core/src/query/SearchOnSealed.cpp 36.84% 12 Missing ⚠️
internal/core/src/query/CachedSearchIterator.cpp 95.85% 7 Missing ⚠️
internal/proxy/search_util.go 89.55% 5 Missing and 2 partials ⚠️
internal/core/src/query/PlanProto.cpp 16.66% 5 Missing ⚠️
internal/core/src/query/SearchBruteForce.cpp 80.95% 4 Missing ⚠️
internal/core/src/query/SearchOnGrowing.cpp 42.85% 4 Missing ⚠️
internal/core/src/query/SearchOnIndex.cpp 42.85% 4 Missing ⚠️
internal/core/src/index/Utils.cpp 88.23% 2 Missing ⚠️
internal/core/src/query/CachedSearchIterator.h 95.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #37947       +/-   ##
===========================================
+ Coverage   69.32%   80.98%   +11.65%     
===========================================
  Files         292     1381     +1089     
  Lines       26183   194670   +168487     
===========================================
+ Hits        18152   157644   +139492     
- Misses       8031    31463    +23432     
- Partials        0     5563     +5563     
Components Coverage Δ
Client 75.27% <ø> (∅)
Core 69.44% <85.50%> (+0.11%) ⬆️
Go 83.02% <93.06%> (∅)
Files with missing lines Coverage Δ
internal/core/src/common/QueryInfo.h 100.00% <100.00%> (ø)
internal/core/src/index/Utils.h 88.88% <ø> (ø)
internal/core/src/index/VectorDiskIndex.cpp 76.96% <100.00%> (-1.05%) ⬇️
internal/core/src/index/VectorMemIndex.cpp 64.80% <100.00%> (-0.35%) ⬇️
internal/proxy/proxy.go 71.30% <100.00%> (ø)
internal/proxy/task.go 80.69% <ø> (ø)
internal/proxy/task_search.go 76.34% <100.00%> (ø)
internal/core/src/query/CachedSearchIterator.h 95.00% <95.00%> (ø)
internal/core/src/index/Utils.cpp 40.90% <88.23%> (+3.96%) ⬆️
internal/core/src/query/SearchBruteForce.cpp 79.72% <80.95%> (-0.85%) ⬇️
... and 6 more

... and 1084 files with indirect coverage changes

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Nov 25, 2024

rerun cpp-unit-test

Copy link
Contributor

mergify bot commented Nov 25, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@PwzXxm PwzXxm force-pushed the search_iter_v2_s1 branch 2 times, most recently from 00a762f to 97467bf Compare November 25, 2024 11:30
Copy link
Contributor

mergify bot commented Nov 25, 2024

@PwzXxm E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 25, 2024

@PwzXxm go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 25, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 27, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 27, 2024

@PwzXxm E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Nov 29, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Nov 29, 2024

/hold

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Nov 29, 2024

/unhold
Rename iterator token to iterator id

Copy link
Contributor

mergify bot commented Nov 29, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

Copy link
Contributor

@MrPresent-Han MrPresent-Han left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: just some questions

heap.pop();

// last_bound may change between NextBatch calls, discard any invalid results
if (!IsValid(cur_rst, last_bound, radius, range_filter)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so v2 iterator will not return better results compared to former iterations page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this stage, the next one will try to take care of this.

const float dist = result.first;
const bool is_valid =
!last_bound.has_value() || dist > last_bound.value();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: no need to consider the positive or negative metrics for dist and last_bound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distances are converted when entering this class, no need to worry about it here

@@ -124,6 +125,19 @@ SearchOnGrowing(const segcore::SegmentGrowingImpl& segment,

// step 3: brute force search where small indexing is unavailable
auto vec_ptr = record.get_data_base(vecfield_id);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cached itrator will be created every time, so what is 'cached'?

Copy link
Contributor Author

@PwzXxm PwzXxm Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will introduce a pool of results in the next stage, as commented in https://github.com/milvus-io/milvus/pull/37947/files/9f6b88743198a575eb84cb427bcd41a7631676b7#diff-7344957165f4632a9363de767323618b7db0bd2d0f7cf7165965d3fb2612f18b. This class tries to provide a framework for the further implementation. If you think this name is confusing, I will change the naming if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to bother, just follow your scheme

search_result.distances_.resize(nq_ * batch_size_);

for (size_t query_idx = 0; query_idx < nq_; ++query_idx) {
auto rst = GetBatchedNextResults(query_idx, search_info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: seems that offsets and distances data retrieved are copied twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distance-id pairs need to be sorted before copy to the search_result. Knowhere needs to provide the ability to give batched results via iterator to eliminate this copy.

@PwzXxm PwzXxm force-pushed the search_iter_v2_s1 branch from bd0b450 to 8774ff5 Compare December 2, 2024 04:13
Copy link
Contributor

mergify bot commented Dec 2, 2024

@PwzXxm go-sdk check failed, comment rerun go-sdk can trigger the job again.

Copy link
Contributor

mergify bot commented Dec 2, 2024

@PwzXxm cpp-unit-test check failed, comment rerun cpp-unit-test can trigger the job again.

@PwzXxm PwzXxm force-pushed the search_iter_v2_s1 branch 2 times, most recently from 5b68678 to 1995a47 Compare December 2, 2024 10:02
@mergify mergify bot added the ci-passed label Dec 2, 2024
@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 3, 2024

/hold
V2.5.1

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 3, 2024

/unhold

@MrPresent-Han
Copy link
Contributor

/lgtm

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 4, 2024

/hold
v.2.5.1

@sre-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@mergify mergify bot removed the ci-passed label Dec 13, 2024
Copy link
Contributor

mergify bot commented Dec 13, 2024

@PwzXxm go-sdk check failed, comment rerun go-sdk can trigger the job again.

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 13, 2024

rerun go-sdk

@mergify mergify bot added the ci-passed label Dec 13, 2024
Signed-off-by: Patrick Weizhi Xu <[email protected]>
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: PwzXxm
To complete the pull request process, please assign tedxu after the PR has been reviewed.
You can assign the PR to them by writing /assign @tedxu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify mergify bot removed the ci-passed label Dec 18, 2024
Copy link
Contributor

mergify bot commented Dec 18, 2024

@PwzXxm go-sdk check failed, comment rerun go-sdk can trigger the job again.

@PwzXxm
Copy link
Contributor Author

PwzXxm commented Dec 18, 2024

rerun go-sdk

@mergify mergify bot added the ci-passed label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/compilation area/dependency Pull requests that update a dependency file ci-passed dco-passed DCO check passed. do-not-merge/hold kind/enhancement Issues or changes related to enhancement size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants