Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for ef_search query parameter in FAISS engine #1707

Merged
merged 18 commits into from
Jun 6, 2024

Conversation

shatejas
Copy link
Collaborator

@shatejas shatejas commented May 16, 2024

Description

This introduces ef_search as a query parameter for search based on hnsw algorithm. This PR has changes for FAISS engine. Support for other engines will be added in consecutive PRs. Radial search support for FAISS will also be added in consecutive PRs. These changes are focused on k based queries.

Issues Resolved

#1537

Local testing

curl -XGET "http://localhost:9200/faiss-hnsw-index/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "knn": {
      "faiss_vector": {
        "vector": [2, 3, 5],
        "k": 2,
        "method_parameters": {
           "ef_search" : 5

          }
        }
      }
    }
  }'
{
  "took": 69,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 0.11918951,
    "hits": [
      {
        "_index": "faiss-hnsw-index",
        "_id": "3",
        "_score": 0.11918951,
        "_source": {
          "faiss_vector": [
            3.5,
            4.5,
            3.3
          ],
          "price": 12.9
        }
      },
      {
        "_index": "faiss-hnsw-index",
        "_id": "2",
        "_score": 0.10706638,
        "_source": {
          "faiss_vector": [
            2.5,
            3.5,
            2.2
          ],
          "price": 7.1
        }
      }
    ]
  }
}

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented. - pending documentation
    • [] New functionality has javadoc added - added wherever I felt required. open to feedback and I can add it
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

jni/include/faiss_wrapper.h Show resolved Hide resolved
jni/include/faiss_wrapper.h Show resolved Hide resolved
hnswParams.efSearch = hnswReader->hnsw.efSearch;
idGrouper = buildIDGrouperBitmap(jniUtil, env, parentIdsJ, &idGrouperBitmap);
hnswParams.grp = idGrouper.get();
if(hnswReader!= nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we test with cases what happnes when hnswReader is not null and parentIdsJ is Null and vice versa , I feel earlier condition was fine to check both not null. And Can we add test case for all Corner cased like when one is null and other is not , just to make sure it works , Since we are removing one condition from Top If Statment.

Copy link
Collaborator Author

@shatejas shatejas May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the earlier condition efSearch was not getting updated and it was using the default value of 16 with the combination of filter and parent filter being null. I will let others evaluate this

As mentioned, still working on the unit test

src/main/java/org/opensearch/knn/index/query/KNNQuery.java Outdated Show resolved Hide resolved
shatejas added 8 commits May 19, 2024 16:27
methods

This unit test specifically focuses on the changes related to ef_search
Detailed assertions related to grouper and selector can be picked up
later.

Tests related to any other methods are not added

Signed-off-by: Tejas Shah <[email protected]>
compatibility during upgrades

Signed-off-by: Tejas Shah <[email protected]>
Signed-off-by: Tejas Shah <[email protected]>
state

Customers are not expected use the query with new parameters unless its
fully upgraded.

Signed-off-by: Tejas Shah <[email protected]>
This makes sure that bwc passes in the PR. will be changed to 2.15 to
once this is merged to 2.x

Signed-off-by: Tejas Shah <[email protected]>
@shatejas shatejas requested a review from jmazanec15 May 30, 2024 01:05
@jmazanec15
Copy link
Member

I don't think I would prefer a string to object map, a typed structure is easy to work with. Moreover the primary objective was to not have the parameters cluttered. This encapsulation makes sure of that, the map introduces a possibility of getting abused in the future.

Right, Im on the fence about this - mainly because it leads to more complex interactions with objects from the C++ layer. But, Im good with this approach for now.

For separation of responsibilites, I meant delegate handling of parsing to the method in the engine. For instance, do something like Faiss.parseSearchParameters(KNNMethodContext, ThingToParse) -> return SearchParameters and pass it down the chain of abstractions.

This way, we can encapsulate the search time parameters with the engine/method combo. Some day, we may pull the engines out and make them pluggable/more extendable. So, Im thinking that it would be better to have this logic encapsulated there. In the end, the KNNMethod is the right place to define the search time parameters. Thats where other parameters are already defined.

Copy link
Collaborator

@navneet1v navneet1v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed my first pass on the src files. There are some good abstractions that have been added. I am seeing intensive use of streams and optionals. I personally feel we can avoid those things as search path of k-NN is very sensitive and these smaller abstractions of optional and stream adds latencies when called enough number of times doesn't matter from where they get called from.

import static org.opensearch.knn.common.KNNConstants.*;

@AllArgsConstructor
public class FaissHNSWFlatE2EIT extends KNNRestTestCase {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need another IT class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it parametrized. I really hate copy pasting tests - one change can potentially fail many tests and as side effect we need to change all the tests.

Since the plugin is not on junit 5 combined with the use of carrot search we can't use the same class to have parametrized tests. we need a separate class per test but we can add multiple test cases in there

import static org.opensearch.knn.common.KNNConstants.METHOD_PARAMETER_EF_SEARCH;
import static org.opensearch.knn.index.query.KNNQueryBuilder.NAME;

public final class KNNXParserUtil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good abstraction. I like this.

@@ -25,6 +25,7 @@ public class KNNConstants {
public static final String VECTOR = "vector";
public static final String K = "k";
public static final String TYPE_KNN_VECTOR = "knn_vector";
public static final String METHOD_PARAMETER = "method_parameter";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this constant is getting used only once, please move this to the file where it is getting used.


private static final Set<String> VALID_METHOD_PARAMETERS = ImmutableSet.of(METHOD_PARAMETER_EF_SEARCH);

@SneakyThrows(IOException.class)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't add SneakyThrows in src code. Better to remove these annotations. You can use it in UTs and ITs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with specific exceptions?....will remove

private static final Set<String> VALID_METHOD_PARAMETERS = ImmutableSet.of(METHOD_PARAMETER_EF_SEARCH);

@SneakyThrows(IOException.class)
public static AlgoQueryParameters streamInMethodParameters(StreamInput in) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an interesting choice of naming, we are calling function as streamInMethodParameters and rather then returning method parameters it is returning a class named as AlgoQueryParameters. We should be consistent in terms of what parameters these are. Are they algo related parameters or they are method related parameters. If you think algo and method is same thing, so use consistent naming.

Copy link
Collaborator Author

@shatejas shatejas May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the thought process was to make sure nothing other than algoquery parameters are there in this. From method parameters discussion things like overscorer can potential fit in method_params but they shouldn't be clubbed and be added under algo params. The naming was chosen keeping in mind that. If something else get added in method params we should think about where to put it rather than using algo params as default. I can take a look at the structure again to see if I can make it more obvious

}
}

return Optional.ofNullable((Integer) parameters.get(METHOD_PARAMETER_EF_SEARCH))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This good for 1 single attribute. But we might have to add more validations once we have more attributes that are getting parsed. I would recommend adding a todo here to handle that condition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the intention is to build on this when nProbs is added

}

return Optional.ofNullable((Integer) parameters.get(METHOD_PARAMETER_EF_SEARCH))
.filter(ef -> EF_SEARCH_FIELD.match(METHOD_PARAMETER_EF_SEARCH, parser.getDeprecationHandler()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some checks to ensure that ef_Search is a +ve value. my worry is if some once gives a value which is out of int range we typecasting it to int will lead to negative value. or someone my pass -ve values. So lets ensure we are throwing right exceptions in those cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some checks to ensure that ef_Search is a +ve value

+ve check is present here. Overall I would want to decouple parsing and validations moving forward and I have tried to reflect that with ef_search

* Functional Interface for algorithm specific parameters.
* These parameters are passed to the algorithm implementation while processing a query
*/
public interface AlgoQueryParameters {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an interface or abstract class? I see this as an abstract class rather than interface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract classes open up possibility of abusing it for code reuse. Inheritance should only be used to enforce behavior, Composition should be used for code reuse. So default is interface. Abstract classes have limited use cases and they should be used sparingly.

Functional interfaces like these are generally used to have bounds on the type of objects passed and that helps not have type Object in the function signature which can be easily abused.

@@ -15,7 +15,7 @@ jobs:
matrix:
java: [ 11, 17 ]
os: [ubuntu-latest]
bwc_version : [ "2.0.1", "2.1.0", "2.2.1", "2.3.0", "2.4.1", "2.5.0", "2.6.0", "2.7.0", "2.8.0", "2.9.0", "2.10.0", "2.11.0", "2.12.0", "2.13.0", "2.14.0-SNAPSHOT"]
bwc_version : [ "2.0.1", "2.1.0", "2.2.1", "2.3.0", "2.4.1", "2.5.0", "2.6.0", "2.7.0", "2.8.0", "2.9.0", "2.10.0", "2.11.0", "2.12.0", "2.13.0", "2.14.0", "2.15.0-SNAPSHOT"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could have been added in a different PR. I have raised a PR for this: #1724

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be removed when I sync my branch

@navneet1v
Copy link
Collaborator

For separation of responsibilites, I meant delegate handling of parsing to the method in the engine. For instance, do something like Faiss.parseSearchParameters(KNNMethodContext, ThingToParse) -> return SearchParameters and pass it down the chain of abstractions.

I haven't read the whole conversation. But I kind of agree on this part only, its good that we are building an algorithm specific abstraction. But once we have that abstraction we can pass it through 1 more level of check from Engines to see if they don't support a specific parameters they can fail the search call.

Now, should we allow Engines to parse the parameters I would say no. We should give them a parsed object and let them validate it. This will ensure that k-NN plugin controls the interface and engines just do the validation, rather than just giving engines control to define the interface.

@shatejas
Copy link
Collaborator Author

Completed my first pass on the src files. There are some good abstractions that have been added. I am seeing intensive use of streams and optionals. I personally feel we can avoid those things as search path of k-NN is very sensitive and these smaller abstractions of optional and stream adds latencies when called enough number of times doesn't matter from where they get called from.

streams I have tried to avoid might have missed it, will take another pass. Optionals are a surprise honestly as they are pretty light weight. I like them because they help avoid branching. Will remove them

@shatejas shatejas requested review from junqiu-lei and navneet1v June 5, 2024 02:19
import java.util.Map;

/**
* Holds context related to a method for a particular engine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this information is stored in a map in the engines. Is the idea to migrate those maps to this abstraction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its still a map (see usage in native library) but not declared as a static to better structure and hold all context in one place

Is the idea to migrate those maps to this abstraction?

Yes the intent is that so we can have just one map instead of having multiple maps in knnengine instance. Did not want to club the refactor with this feature. We can build on it and needs a separate PR.

*
* TODO: Move KnnMethod in here
*/
public interface EngineSpecificMethodContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about name MethodImplementationContext?

Copy link
Collaborator Author

@shatejas shatejas Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like having "engine/nativelibrary" and "method" both in there to clearly indicate that it shouldn't be used stand alone ideally (this can only be enforced through PRs). Any other suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like having "engine/nativelibrary" and "method" both in there to clearly indicate that it shouldn't be used stand alone ideally

What do you mean by stand alone here?

Copy link
Collaborator Author

@shatejas shatejas Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, I can see someone using it to get HNSW method parameters in universal fashion and plug it somewhere other than engine.

shatejas added 2 commits June 5, 2024 12:23
- Changes HnswContext to DefaultHnswContext
- corrects * imports

Signed-off-by: Tejas Shah <[email protected]>
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a functionality perspective, I think it looks good. I want to understand a little bit more on where method abstraction refactor is going - it seems partially complete in this PR (which is okay for feature branch). I just want to understand what it will look like in the end. A couple questions:

  1. How will we eventually define supported methods for each engine? Is the plan to move away from current mechanism to new paths.
  2. Will we be moving everything in index.util package to the new engine package?


public final class KNNXParserUtil {

public static Map<String, Object> parseJsonObject(XContentParser parser) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't aware of the map, it pretty much does the same thing except the initial start object check. I will switch to that and delete this file

Copy link
Collaborator Author

@shatejas shatejas Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this a part of another PR. Will be done before merging into main

@shatejas
Copy link
Collaborator Author

shatejas commented Jun 6, 2024

  1. How will we eventually define supported methods for each engine? Is the plan to move away from current mechanism to new paths.

The KNNengine enum which defines the instance is fine, The native library has deep hierarchies which can turn problematic but for now I wouldn't want to remove it. What I would like to see us move away from is giant static maps that are defined in each engine class. Those aren't maintainable and hard to follow. I am trying to structure it in a way that these are more manageable at the same time easy to use. I will give psuedocode for FAISS maybe on how the final structure would look like

class FAISS extends NativeLibrary {

     // singleton
     Map<String, EngineSpecificeMethodContext> methodContext = Map.of("hnsw", new FaissHnswContext(), "ivf", new FaissIVFContext());

//other engine specific stuff
}

FaissHnswContext.java

class FaissHnswContext implements EngineSpecificMethodContext {

      //singleton
      KNNMethod supportedMethodComponents();
      Map<String, Parameter> supportedMethodParameters();
      //anything that is unique to engine and method specific that comes around maybe supportedQuantization() ? I know its encapsulated in knnmethod but just to give you an idea

}

ofcourse for reusability we can have something like DefaultHNSWContext. Again this is not set in stone and have tried to structure with the limited context I have. I am open to suggestions here.

  1. Will we be moving everything in index.util package to the new engine package?

The package structure isn't intuitive right now and sure if it makes more sense we can move it

@jmazanec15
Copy link
Member

What I would like to see us move away from is giant static maps that are defined in each engine class.

+1 want to remove this. I would like to move these to classes eventually.

The package structure isn't intuitive right now and sure if it makes more sense we can move it

Right, this is overdue for refactor.

I think the long term potential goal is to encapsulate this information in a strong interface so that backend method/implementation can be easily extended by a third-party.

@jmazanec15 jmazanec15 merged commit 5a57a0d into opensearch-project:feature/ef-search Jun 6, 2024
5 of 51 checks passed
shatejas added a commit to shatejas/k-NN that referenced this pull request Jun 26, 2024
…h-project#1707)

* Adds support for ef_search as a query parameter in faiss engine

Signed-off-by: Tejas Shah <[email protected]>

* Fixing checkstyle

Signed-off-by: Tejas Shah <[email protected]>

* Corrects imports

Signed-off-by: Tejas Shah <[email protected]>

* Adds unit test for faiss_wrapper for queryIndex and queryIndexWithFilter
methods

This unit test specifically focuses on the changes related to ef_search
Detailed assertions related to grouper and selector can be picked up
later.

Tests related to any other methods are not added

Signed-off-by: Tejas Shah <[email protected]>

* Gates efSearch behind 2.15 version to make sure there is backward
compatibility during upgrades

Signed-off-by: Tejas Shah <[email protected]>

* Adds faiss_wrapper_unit_test.cpp in CmakeList

Signed-off-by: Tejas Shah <[email protected]>

* Adds change log

Signed-off-by: Tejas Shah <[email protected]>

* Adds BWC tests for efSearch

Signed-off-by: Tejas Shah <[email protected]>

* Removes EfSearch validation from rolling upgrade test for Mixed cluster
state

Customers are not expected use the query with new parameters unless its
fully upgraded.

Signed-off-by: Tejas Shah <[email protected]>

* Changing minimum version required for efSearch to be 3.0.0

This makes sure that bwc passes in the PR. will be changed to 2.15 to
once this is merged to 2.x

Signed-off-by: Tejas Shah <[email protected]>

* Fixes the checkstyle failure

Signed-off-by: Tejas Shah <[email protected]>

* Adds method_parameter in knn search request.

method_parameters will be used to hold shard level algorithm parameters

Signed-off-by: Tejas Shah <[email protected]>

* Updates the documentation for faiss wrapper

Signed-off-by: Tejas Shah <[email protected]>

* Stubs out method parameter parsing logic into its own class

Signed-off-by: Tejas Shah <[email protected]>

* Adds method parameters and validates against engine specific parameters

Signed-off-by: Tejas Shah <[email protected]>

* Removes engine and method functions from EngineSpecificMethodContext

Signed-off-by: Tejas Shah <[email protected]>

* - Does not throw in case of illegal state exception if method is null
- Changes HnswContext to DefaultHnswContext
- corrects * imports

Signed-off-by: Tejas Shah <[email protected]>

---------

Signed-off-by: Tejas Shah <[email protected]>
shatejas added a commit to shatejas/k-NN that referenced this pull request Jun 27, 2024
…h-project#1707)

* Adds support for ef_search as a query parameter in faiss engine

Signed-off-by: Tejas Shah <[email protected]>

* Fixing checkstyle

Signed-off-by: Tejas Shah <[email protected]>

* Corrects imports

Signed-off-by: Tejas Shah <[email protected]>

* Adds unit test for faiss_wrapper for queryIndex and queryIndexWithFilter
methods

This unit test specifically focuses on the changes related to ef_search
Detailed assertions related to grouper and selector can be picked up
later.

Tests related to any other methods are not added

Signed-off-by: Tejas Shah <[email protected]>

* Gates efSearch behind 2.15 version to make sure there is backward
compatibility during upgrades

Signed-off-by: Tejas Shah <[email protected]>

* Adds faiss_wrapper_unit_test.cpp in CmakeList

Signed-off-by: Tejas Shah <[email protected]>

* Adds change log

Signed-off-by: Tejas Shah <[email protected]>

* Adds BWC tests for efSearch

Signed-off-by: Tejas Shah <[email protected]>

* Removes EfSearch validation from rolling upgrade test for Mixed cluster
state

Customers are not expected use the query with new parameters unless its
fully upgraded.

Signed-off-by: Tejas Shah <[email protected]>

* Changing minimum version required for efSearch to be 3.0.0

This makes sure that bwc passes in the PR. will be changed to 2.15 to
once this is merged to 2.x

Signed-off-by: Tejas Shah <[email protected]>

* Fixes the checkstyle failure

Signed-off-by: Tejas Shah <[email protected]>

* Adds method_parameter in knn search request.

method_parameters will be used to hold shard level algorithm parameters

Signed-off-by: Tejas Shah <[email protected]>

* Updates the documentation for faiss wrapper

Signed-off-by: Tejas Shah <[email protected]>

* Stubs out method parameter parsing logic into its own class

Signed-off-by: Tejas Shah <[email protected]>

* Adds method parameters and validates against engine specific parameters

Signed-off-by: Tejas Shah <[email protected]>

* Removes engine and method functions from EngineSpecificMethodContext

Signed-off-by: Tejas Shah <[email protected]>

* - Does not throw in case of illegal state exception if method is null
- Changes HnswContext to DefaultHnswContext
- corrects * imports

Signed-off-by: Tejas Shah <[email protected]>

---------

Signed-off-by: Tejas Shah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants