Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove parallelization of get hybridscores #779

Conversation

VijayanB
Copy link
Member

@VijayanB VijayanB commented Jun 6, 2024

Description

This parallelization is not adding any value after comparing the benchmarks with and without this optimization. Hence removing it.

Issues Resolved

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

This parallelization is not adding any value after comparing
the benchmarks with and without this optimization.
Hence removing it.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
@martin-gaievski
Copy link
Member

Can you share rough results of benchmarks, is it the same latency or worse with and without parallel get scores?

@VijayanB
Copy link
Member Author

VijayanB commented Jun 6, 2024

Can you share rough results of benchmarks, is it the same latency or worse with and without parallel get scores?

Query Count Vector Count No of Vector Search Query No of Term Query No of Sub-queries is parallel enabled K size P50 ( client time in ms ) P90 ( client time in ms ) P99 ( client time in ms )
1000 50000 1 1 2 no 100 100 101 106 121
1000 50000 1 1 2 yes 100 100 101 107 122
1000 50000 2 1 3 no 100 100 100 108 149
1000 50000 2 1 3 yes 100 100 100 108 152
1000 50000 1 2 3 no 100 100 106 120 200
1000 50000 1 2 3 yes 100 100 106 122 205

@VijayanB
Copy link
Member Author

VijayanB commented Jun 6, 2024

Can you share rough results of benchmarks, is it the same latency or worse with and without parallel get scores?

Ran 10 experiments with and without this change, P50 looks same, whereas i see few degradation with P90 and P99 ( in few milliseconds) in 8 out of 10 experiments. This is also consistent with took time. However, i can take a follow up task on how to tune this to improve further .

@VijayanB
Copy link
Member Author

VijayanB commented Jun 6, 2024

I added skip change-log, while merging to main, will group it into 1.

@VijayanB VijayanB merged commit ed3b974 into opensearch-project:feature/parallelize-hybrid-search Jun 6, 2024
86 of 102 checks passed
VijayanB added a commit to VijayanB/neural-search that referenced this pull request Jun 10, 2024
…ject#779)

This parallelization is not adding any value after comparing
the benchmarks with and without this optimization.
Hence removing it.

Signed-off-by: Vijayan Balasubramanian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants