[Bug]: [hybrid_search] The rerank effect needs more improvements when setting different metric type for different vector field using "WeightedRanker" reranker #31368

binbinlv · 2024-03-18T09:10:02Z

Is there an existing issue for this?

I have searched the existing issues

Environment

- Milvus version: master-latest
- Deployment mode(standalone or cluster): both
- MQ type(rocksmq, pulsar or kafka):    all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0rc57
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The rerank effect may be bad when setting different metric type for different vector field using "WeightedRanker" reranker:

for example:

when setting metric type "COSINE" for float vector field A, and setting "L2" for float vector field B, then now it will choose the metric type for the first vector field in schema as the sorted way for the hybrid search result.

But this way seems not reflecting the similarity after reranking, because "COSINE" is the larger the similar, and "L2" is the smaller the similar.

And another point is that it seems not very meaningful to weighted sum of two values in huge range difference, just like the range of "COSINE" is "-1 ~ 1", and "L2" is "-∞ ～ +∞“.

Expected Behavior

A better WeightedRanker algorithm design which could reflect the real similarity

Steps To Reproduce

create a collection with 3 vector fields
insert data
create index in "COSINE", "L2", "IP" metric type in the 3 vector fields
load
hybrid search using "WeightedRanker" reranker

    @pytest.mark.parametrize("primary_field", [ct.default_int64_field_name])
    def test_hybrid_search_different_metric_type_each_field(self, primary_field, dim, auto_id, is_flush,
                                                 enable_dynamic_field, metric_type):
        """
        target: test hybrid search for fields with different metric type
        method: create connection, collection, insert and search
        expected: hybrid search successfully with limit(topK)
        """
        # 1. initialize collection with data
        collection_w, _, _, insert_ids, time_stamp = \
            self.init_collection_general(prefix, True, auto_id=auto_id, dim=dim, is_flush=is_flush, is_index=False,
                                         primary_field=primary_field,
                                         enable_dynamic_field=False, multiple_dim_array=[dim, dim])[0:5]
        # 2. extract vector field name
        vector_name_list = cf.extract_vector_field_name_list(collection_w)
        vector_name_list.append(ct.default_float_vec_field_name)
        log.debug(vector_name_list)
        flat_index = {"index_type": "FLAT", "params": {}, "metric_type": "L2"}
        collection_w.create_index(vector_name_list[0], flat_index)
        flat_index = {"index_type": "FLAT", "params": {}, "metric_type": "IP"}
        collection_w.create_index(vector_name_list[1], flat_index)
        flat_index = {"index_type": "FLAT", "params": {}, "metric_type": "COSINE"}
        collection_w.create_index(vector_name_list[2], flat_index)
        collection_w.load()
        # 3. prepare search params
        req_list = []
        search_param = {
            "data": [[random.random() for _ in range(dim)] for _ in range(1)],
            "anns_field": vector_name_list[0],
            "param": {"metric_type": "L2", "offset": 0},
            "limit": default_limit,
            "expr": "int64 > 0"}
        req = AnnSearchRequest(**search_param)
        req_list.append(req)
        search_param = {
            "data": [[random.random() for _ in range(dim)] for _ in range(1)],
            "anns_field": vector_name_list[1],
            "param": {"metric_type": "IP", "offset": 0},
            "limit": default_limit,
            "expr": "int64 > 0"}
        req = AnnSearchRequest(**search_param)
        req_list.append(req)
        search_param = {
            "data": [[random.random() for _ in range(dim)] for _ in range(1)],
            "anns_field": vector_name_list[2],
            "param": {"metric_type": "COSINE", "offset": 0},
            "limit": default_limit,
            "expr": "int64 > 0"}
        req = AnnSearchRequest(**search_param)
        req_list.append(req)
        # 4. hybrid search
        hybrid_search = collection_w.hybrid_search(req_list, WeightedRanker(0.1, 0.9, 1), default_limit,
                                                   check_task=CheckTasks.check_search_results,
                                                   check_items={"nq": 1,
                                                                "ids": insert_ids,
                                                                "limit": default_limit})[0]
        log.debug(hybrid_search[0].ids)
        log.debug(hybrid_search[0].distances)

Milvus Log

No response

Anything else?

No response

yanliang567 · 2024-03-18T09:46:13Z

/unassign

czs007 · 2024-03-20T02:14:26Z

working on it

issue: #25639 #31368 pr :#32020 Signed-off-by: zhenshan.cao <[email protected]>

issue: #25639 #31368 Signed-off-by: zhenshan.cao <[email protected]>

issue: #31368 pr: #32289 Signed-off-by: binbin lv <[email protected]>

issue: milvus-io#31368 pr: milvus-io#32289 Signed-off-by: binbin lv <[email protected]>

issue: #31368 Signed-off-by: binbin lv <[email protected]>

stale · 2024-06-11T03:08:12Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

binbinlv added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 18, 2024

binbinlv added this to the 2.4.0 milestone Mar 18, 2024

binbinlv assigned czs007 and yanliang567 Mar 18, 2024

yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. kind/improvement Changes related to something improve, likes ut and code refactor and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 18, 2024

sre-ci-robot unassigned yanliang567 Mar 18, 2024

czs007 mentioned this issue Mar 31, 2024

enhance: Refactor hybrid search #31742

Merged

czs007 mentioned this issue Apr 8, 2024

enhance:Refactor hybrid search #32020

Merged

sre-ci-robot pushed a commit that referenced this issue Apr 9, 2024

enhance: Refactor hybrid search (#31742)

4c07304

issue: #25639 #31368 pr :#32020 Signed-off-by: zhenshan.cao <[email protected]>

sre-ci-robot pushed a commit that referenced this issue Apr 9, 2024

enhance:Refactor hybrid search (#32020)

089c805

issue: #25639 #31368 Signed-off-by: zhenshan.cao <[email protected]>

This was referenced Apr 15, 2024

test: add test cases for code change #32289

Merged

test: add test cases for code change #32290

Merged

sre-ci-robot pushed a commit that referenced this issue Apr 16, 2024

test: add test cases for code change (#32290)

9b25ce8

issue: #31368 pr: #32289 Signed-off-by: binbin lv <[email protected]>

yanliang567 modified the milestones: 2.4.0, 2.4.1 Apr 18, 2024

yellow-shine pushed a commit to yellow-shine/milvus that referenced this issue Apr 18, 2024

test: add test cases for code change (milvus-io#32290)

06ceb7b

issue: milvus-io#31368 pr: milvus-io#32289 Signed-off-by: binbin lv <[email protected]>

sre-ci-robot pushed a commit that referenced this issue Apr 29, 2024

test: add test cases for code change (#32289)

083bd38

issue: #31368 Signed-off-by: binbin lv <[email protected]>

yanliang567 modified the milestones: 2.4.1, 2.4.2 May 7, 2024

stale bot added the stale indicates no udpates for 30 days label Jun 11, 2024

stale bot closed this as completed Jun 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [hybrid_search] The rerank effect needs more improvements when setting different metric type for different vector field using "WeightedRanker" reranker #31368

[Bug]: [hybrid_search] The rerank effect needs more improvements when setting different metric type for different vector field using "WeightedRanker" reranker #31368

binbinlv commented Mar 18, 2024 •

edited

Loading

yanliang567 commented Mar 18, 2024

czs007 commented Mar 20, 2024

stale bot commented Jun 11, 2024

[Bug]: [hybrid_search] The rerank effect needs more improvements when setting different metric type for different vector field using "WeightedRanker" reranker #31368

[Bug]: [hybrid_search] The rerank effect needs more improvements when setting different metric type for different vector field using "WeightedRanker" reranker #31368

Comments

binbinlv commented Mar 18, 2024 • edited Loading

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

yanliang567 commented Mar 18, 2024

czs007 commented Mar 20, 2024

stale bot commented Jun 11, 2024

binbinlv commented Mar 18, 2024 •

edited

Loading