Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Faiss engine to allow PQ and HNSW #1074

Merged
merged 1 commit into from
Aug 30, 2023

Conversation

jmazanec15
Copy link
Member

@jmazanec15 jmazanec15 commented Aug 30, 2023

Description

Updates faiss engine to enable hnsw and faiss to work together. For HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX" works. This causes a problem when producing the index description string here: https://github.com/opensearch-project/k-NN/blob/2.9/src/main/java/org/opensearch/knn/index/util/Faiss.java#L104. To change this, the PQ encoder for HNSW needs to be handled differently than PQ does for IVF (which this does work for). So they are split out in Faiss.java.

Additionally, adds several unit tests and integration tests in order to validate the functionality. For PQ with code_size of 8, the minimum number of training points to prevent failure is 2^8=256. This does lead to the tests taking a little bit longer.

In order to create a model with HNSW and PQ, the following train request needs to be submitted:

POST /_plugins/_knn/models/my-model/_train
{
  "training_index": "train-index",
  "training_field": "train-field",
  "dimension": 128,
  "description": "My model description",
  "method": {
    "name": "hnsw",
    "engine": "faiss",
    "space_type": "l2",
    "parameters": {
      "m": 16,
      "encoder": {
        "name": "pq",
        "parameters": {
          "m": 8
        }
      }
    }
  }
}

and then an index can be created once training finishes:

PUT /target-index
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "target-field": {
        "type": "knn_vector",
        "model_id": "my-model"
      }
    }
  }
}

Issues Resolved

#1064

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@navneet1v
Copy link
Collaborator

can we add the example which provide details on how HNSW-PQ should be used?

@navneet1v
Copy link
Collaborator

We should also update the perf tool to include HNSW-PQ in the perf testing too. Will there be a separate PR for that?

@jmazanec15
Copy link
Member Author

can we add the example which provide details on how HNSW-PQ should be used?

Added

We should also update the perf tool to include HNSW-PQ in the perf testing too. Will there be a separate PR for that?

Yes, I think I will raise a separate PR for that.

@codecov
Copy link

codecov bot commented Aug 30, 2023

Codecov Report

Merging #1074 (c0e2297) into main (8994de6) will decrease coverage by 0.04%.
The diff coverage is 74.46%.

@@             Coverage Diff              @@
##               main    #1074      +/-   ##
============================================
- Coverage     85.06%   85.03%   -0.04%     
- Complexity     1181     1184       +3     
============================================
  Files           159      159              
  Lines          4788     4811      +23     
  Branches        433      433              
============================================
+ Hits           4073     4091      +18     
- Misses          520      524       +4     
- Partials        195      196       +1     
Files Changed Coverage Δ
...main/java/org/opensearch/knn/index/util/Faiss.java 78.51% <74.46%> (+3.00%) ⬆️

... and 1 file with indirect coverage changes

Comment on lines 81 to 83
// TODO: To think about in future: for PQ, if dimension is not divisible by code count, PQ will fail. Right now,
// we do not have a way to base validation off of dimension. Failure will happen during training in JNI.
// Define methods supported by faiss
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a github issue for this TODO and add it here.

Also, can you add details what failures customer will be seeing in the case dimensions are not divisible by code count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's create a github issue for this TODO and add it here.

Will do

Also, can you add details what failures customer will be seeing in the case dimensions are not divisible by code count.

Right, I think user will get some kind of generic message about training failing because we didnt want to send faiss error message in response.

BTW this todo is not new.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navneet1v
Copy link
Collaborator

Can you see why the github workflows(except windows) are failing?

@jmazanec15
Copy link
Member Author

Can you see why the github workflows(except windows) are failing?

Security are related to #1067. I fixed the other errors that were caused by training taking longer than timeout - so I increased the timeout. Codecov is dropping because I created 2 long static maps for each method instead of 1. I added tests so test coverage should increase overall.

Updates faiss engine to enable hnsw and faiss to work together. For
HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the
index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX"
ends up working.

Additionally, adds several unit tests and integration tests in order to
validate the functionality.

Signed-off-by: John Mazanec <[email protected]>
@jmazanec15 jmazanec15 requested a review from navneet1v August 30, 2023 17:14
@jmazanec15 jmazanec15 added Bug Fixes Changes to a system or product designed to handle a programming bug/glitch backport 2.x backport 2.10 backport to 2.10 branch labels Aug 30, 2023
@jmazanec15 jmazanec15 merged commit ce47b1b into opensearch-project:main Aug 30, 2023
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1074-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ce47b1bcb70d550ebc0d086d255b16716fe63c98
# Push it to GitHub
git push --set-upstream origin backport/backport-1074-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1074-to-2.x.

@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.10 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.10 2.10
# Navigate to the new working tree
cd .worktrees/backport-2.10
# Create a new branch
git switch --create backport/backport-1074-to-2.10
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ce47b1bcb70d550ebc0d086d255b16716fe63c98
# Push it to GitHub
git push --set-upstream origin backport/backport-1074-to-2.10
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.10

Then, create a pull request where the base branch is 2.10 and the compare/head branch is backport/backport-1074-to-2.10.

jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Aug 30, 2023
Updates faiss engine to enable hnsw and faiss to work together. For
HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the
index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX"
ends up working.

Additionally, adds several unit tests and integration tests in order to
validate the functionality.

Signed-off-by: John Mazanec <[email protected]>
(cherry picked from commit ce47b1b)
jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Aug 31, 2023
Updates faiss engine to enable hnsw and faiss to work together. For
HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the
index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX"
ends up working.

Additionally, adds several unit tests and integration tests in order to
validate the functionality.

Signed-off-by: John Mazanec <[email protected]>
(cherry picked from commit ce47b1b)
jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Aug 31, 2023
Updates faiss engine to enable hnsw and faiss to work together. For
HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the
index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX"
ends up working.

Additionally, adds several unit tests and integration tests in order to
validate the functionality.

Signed-off-by: John Mazanec <[email protected]>
(cherry picked from commit ce47b1b)
jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Aug 31, 2023
Updates faiss engine to enable hnsw and faiss to work together. For
HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the
index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX"
ends up working.

Additionally, adds several unit tests and integration tests in order to
validate the functionality.

Signed-off-by: John Mazanec <[email protected]>
(cherry picked from commit ce47b1b)
jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Aug 31, 2023
Updates faiss engine to enable hnsw and faiss to work together. For
HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the
index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX"
ends up working.

Additionally, adds several unit tests and integration tests in order to
validate the functionality.

Signed-off-by: John Mazanec <[email protected]>
jmazanec15 added a commit to jmazanec15/k-NN-1 that referenced this pull request Aug 31, 2023
Updates faiss engine to enable hnsw and faiss to work together. For
HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the
index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX"
ends up working.

Additionally, adds several unit tests and integration tests in order to
validate the functionality.

Signed-off-by: John Mazanec <[email protected]>
@jmazanec15 jmazanec15 mentioned this pull request Aug 31, 2023
1 task
jmazanec15 added a commit that referenced this pull request Aug 31, 2023
Updates faiss engine to enable hnsw and faiss to work together. For
HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the
index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX"
ends up working.

Additionally, adds several unit tests and integration tests in order to
validate the functionality.

Signed-off-by: John Mazanec <[email protected]>
jmazanec15 added a commit that referenced this pull request Aug 31, 2023
Updates faiss engine to enable hnsw and faiss to work together. For
HNSW, code_size must be equal to 8 (refer to
facebookresearch/faiss#3027). Therefore, the
index description string "HNSW32,PQXxY" does not work. Only "HNSW32,PQX"
ends up working.

Additionally, adds several unit tests and integration tests in order to
validate the functionality.

Signed-off-by: John Mazanec <[email protected]>
@ymartin-mw
Copy link

is there a plan to support "space_type" cosinesimil for Faiss PQ HNSW?
i feel that l2, innerproduct are not enough.

also, trying to call cosineSimilarity in script score is very slow (https://discuss.elastic.co/t/slow-cosine-similarity-script/299496/3)

@jmazanec15
Copy link
Member Author

Hi @ymartin-mw, faiss does not support cosine similarity directly, so we have not added it. Here is a relevant link: https://github.com/facebookresearch/faiss/wiki/MetricType-and-distances#how-can-i-index-vectors-for-cosine-similarity. TLDR is if you normalize the vectors and then run innerproduct, it will produce the cosine scores.

This normalization is something we could potentially do with faiss APIs in order to transparently support cosine though, but we have not done it yet.

@ymartin-mw
Copy link

ymartin-mw commented Dec 29, 2023

@jmazanec15 thanks for your answer. storing normalized vectors and sending a normalized query vector is exactly what i did. indeed produces the cosine scores.

I needed the scores in [0,1] to combine it with another doc attribute within [0,1] (i.e. (w * score + (1-w) * att)).
Got them in [0,1] by doing this transformation:

"double innerproduct; "
"if ( _score > 1.0 ) { innerproduct = - (1 - _score); } "
"else { innerproduct = - (1 / _score - 1); } "
"_score = ( 2 - ( 1 - innerproduct ) ) / 2; "  # a valid OpenSearch cosinesimil score ( 2 - d ) / 2
... 

I think optionally could also do: cosinesimil = ( _score - 0.5 ) / 1.5
(scaling from [0.5, 2] to [0,1])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport 2.10 backport to 2.10 branch Bug Fixes Changes to a system or product designed to handle a programming bug/glitch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants