Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Latency Improvement Oppurtunity] Improve filter search performance when filter results is matching all the documents in the segment #1387

Open
navneet1v opened this issue Jan 12, 2024 · 4 comments · May be fixed by #2320
Assignees
Labels
Enhancements Increases software capabilities beyond original client specifications

Comments

@navneet1v
Copy link
Collaborator

Description

Currently when filtered vector search is run, we run the filters to find out the relevant filter ids and see if we can do ANN search or exact search for each segment. The identified filterIds are passed to ANN search so that we can do filter while searching.

But in case when lets say filter matched all the docIds in the segment, we can avoid few things:

  1. Converting the filterIds bitmap to filterIds array.
  2. Validating should we do ANN search or Exact search.

2, is very very cheap(as it has bunch of if else check) but 1 can add latency(exact latency is not identified as of now) if the segment is large enough because we need to iterate over the bitmap to find the docids and pass it till the C++ layer. There are some conversion that happen in c++ layer ref: https://github.com/opensearch-project/k-NN/blob/main/jni/src/faiss_wrapper.cpp#L465-L490.

Solution

We can avoid all this computation by adding a simple check to see if max doc and filter bitmap size is same. If both of them are same, we can just to simple ANN search.

@navneet1v navneet1v added Enhancements Increases software capabilities beyond original client specifications and removed untriaged labels Jan 12, 2024
@navneet1v navneet1v changed the title [Latency Improvement Oppurtunity] Improve filter search performance when filter results in matching all the documents in the segment [Latency Improvement Oppurtunity] Improve filter search performance when filter results is matching all the documents in the segment Jan 12, 2024
@luyuncheng
Copy link
Collaborator

We meet the same situation. and in some scenarios filters cause OOM error. we are happy to optimize this issue

@luyuncheng
Copy link
Collaborator

luyuncheng commented Jan 22, 2024

@navneet1v
hi, in #1402 i propose some ideas to optimize the latency and avoid some OOM scenarios.

@navneet1v
Copy link
Collaborator Author

@luyuncheng Its pretty interesting that you guys are getting OOM. My main idea of optimization here was if at a segment level no docs are getting filtered its better to do ANN search without filters.

I will try to look into the PR which you have raised. But on a high level can you add what is the optimization you are thinking in the PR you raised.

@weiwang118
Copy link

This issue is assigned to me and I will complete it by Christmas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancements Increases software capabilities beyond original client specifications
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.

3 participants