Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parent join support for faiss hnsw #1398

Merged
merged 7 commits into from
Jan 19, 2024
Merged

Add parent join support for faiss hnsw #1398

merged 7 commits into from
Jan 19, 2024

Conversation

heemin32
Copy link
Collaborator

@heemin32 heemin32 commented Jan 19, 2024

Description

Add parent join support for faiss hnsw

  • Add patch to support multi vector in faiss
  • Initialize id_map as null
  • Add support of multi vector in jni
  • Multi vector support for Faiss HNSW - approximate search only
  • Add data generation script for nested field
  • Add perf test for nested field

Issues Resolved

#1065

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>
@heemin32 heemin32 changed the title Feature/multi vector Support multi-vector for faiss with HNSW Jan 19, 2024
@heemin32 heemin32 changed the title Support multi-vector for faiss with HNSW Add parent join support for faiss hnsw Jan 19, 2024
Signed-off-by: Heemin Kim <[email protected]>
Copy link

codecov bot commented Jan 19, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (45e9e54) 84.91% compared to head (bc779e1) 84.96%.

Files Patch % Lines
...java/org/opensearch/knn/index/query/KNNWeight.java 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1398      +/-   ##
============================================
+ Coverage     84.91%   84.96%   +0.05%     
- Complexity     1259     1262       +3     
============================================
  Files           165      165              
  Lines          5138     5143       +5     
  Branches        480      480              
============================================
+ Hits           4363     4370       +7     
+ Misses          570      569       -1     
+ Partials        205      204       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martin-gaievski
Copy link
Member

Can you please add to description the list of PRs that we're merging from feature branch?

Copy link
Member

@martin-gaievski martin-gaievski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update PR description, overall PR looks good to me

@heemin32 heemin32 merged commit 709b448 into main Jan 19, 2024
90 of 91 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1398-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 709b448d88bc168e1d7c137bf2cb539c33b28188
# Push it to GitHub
git push --set-upstream origin backport/backport-1398-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1398-to-2.x.

@heemin32
Copy link
Collaborator Author

Can you please add to description the list of PRs that we're merging from feature branch?

They appear in commits tab. Do you want them to be in description as well?

heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
heemin32 added a commit to heemin32/k-NN that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (opensearch-project#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (opensearch-project#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (opensearch-project#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (opensearch-project#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (opensearch-project#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (opensearch-project#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
heemin32 added a commit that referenced this pull request Jan 19, 2024
* Add patch to support multi vector in faiss (#1358)

Signed-off-by: Heemin Kim <[email protected]>

* Initialize id_map as null (#1363)

Signed-off-by: Heemin Kim <[email protected]>

* Add support of multi vector in jni (#1364)

Signed-off-by: Heemin Kim <[email protected]>

* Multi vector support for Faiss HNSW (#1371)

Apply the parentId filter to the Faiss HNSW search method. This ensures that documents are deduplicated based on their parentId, and the method returns k results for documents with nested fields.

Signed-off-by: Heemin Kim <[email protected]>

* Add data generation script for nested field (#1388)

Signed-off-by: Heemin Kim <[email protected]>

* Add perf test for nested field (#1394)

Signed-off-by: Heemin Kim <[email protected]>

---------

Signed-off-by: Heemin Kim <[email protected]>
(cherry picked from commit 709b448)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants