-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration tests for neural query #36
Add integration tests for neural query #36
Conversation
import org.opensearch.common.xcontent.XContentFactory; | ||
import org.opensearch.common.xcontent.XContentHelper; | ||
import org.opensearch.common.xcontent.XContentType; | ||
import org.opensearch.index.query.QueryBuilder; | ||
import org.opensearch.rest.RestStatus; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not the change which you have done, but can you change the base class of BaseNeuralSearchIT from OpenSearchRestTestCase to OpenSearchSecureRestTestCase. This will allow the secure clusters to be used for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can update.
* @return number of documents indexed to that index | ||
*/ | ||
@SneakyThrows | ||
public int getDocCount(String indexName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why public? can't we make it protected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, will update.
modelId.compareAndSet(null, prepareModel()); | ||
} | ||
|
||
private void maybeInitializeIndex(String indexName) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better name for function can be : initializeIndexIfNotExist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
modelId.compareAndSet(null, prepareModel()); | ||
} | ||
|
||
private void maybeInitializeIndex(String indexName) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit-pick] move this private function after all the public and protected functions.
} | ||
|
||
@SneakyThrows | ||
public void testBasicQuery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: as the knowledge of tests will fade as the time passes, lets add the query and response objects in the java doc so that we know what to expect for all the tests present in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each response will be similar. I am going to just provide query.
} | ||
|
||
@SneakyThrows | ||
private String prepareModel() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this to BaseNeuralSearchIT class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
private void prepareKnnIndex(String indexName, List<KNNFieldConfig> knnFieldConfigs) { | ||
createIndexWithConfiguration(indexName, buildIndexConfiguration(knnFieldConfigs), ""); | ||
} | ||
|
||
@SneakyThrows | ||
private String buildIndexConfiguration(List<KNNFieldConfig> knnFieldConfigs) { | ||
XContentBuilder xContentBuilder = XContentFactory.jsonBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move all of these to BaseNeuralSearchIT, as we have search, indexCreation and addKNNDoc all functions there. Plus I believe these can be reused across different ITs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Will move. Will leave initializeIndexIfNotExist because this is more specific to this particular test suite.
public float[] runInference(String modelId, String queryText) { | ||
Response inferenceResponse = makeRequest( | ||
client(), | ||
"POST", | ||
String.format(LOCALE, "/_plugins/_ml/_predict/text_embedding/%s", modelId), | ||
null, | ||
toHttpEntity(String.format(LOCALE, "{\"text_docs\": [\"%s\"],\"target_response\": [\"sentence_embedding\"]}", queryText)), | ||
ImmutableList.of(new BasicHeader(HttpHeaders.USER_AGENT, "Kibana")) | ||
); | ||
|
||
Map<String, Object> inferenceResJson = XContentHelper.convertToMap( | ||
XContentFactory.xContent(XContentType.JSON), | ||
EntityUtils.toString(inferenceResponse.getEntity()), | ||
false | ||
); | ||
|
||
Object inference_results = inferenceResJson.get("inference_results"); | ||
assertTrue(inference_results instanceof List); | ||
List<Object> inferenceResultsAsMap = (List<Object>) inference_results; | ||
assertEquals(1, inferenceResultsAsMap.size()); | ||
Map<String, Object> result = (Map<String, Object>) inferenceResultsAsMap.get(0); | ||
List<Object> output = (List<Object>) result.get("output"); | ||
assertEquals(1, output.size()); | ||
Map<String, Object> map = (Map<String, Object>) output.get(0); | ||
List<Float> data = ((List<Double>) map.get("data")).stream().map(Double::floatValue).collect(Collectors.toList()); | ||
return vectorAsListToArray(data); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason for not using the MLCommonsClient here and directly use the inference API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not really any reason. I can switch to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah actually, I think we need to use rest because from these tests we only have access to RestClient. MLCommonsClientAccessor requires a MachineLearningNodeClient which requires a Client which is internal to the cluster.
"framework_type": "sentence_transformers", | ||
"all_config": "{\"architectures\":[\"BertModel\"],\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":6}" | ||
}, | ||
"url": "https://github.com/opensearch-project/ml-commons/blob/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true" | ||
"url": "https://github.com/jmazanec15/k-NN-1/blob/model/traced_small_model.zip?raw=true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary: Will update URL to ml-commons once uploaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmazanec15 , new model updated in ml-commons, you can use this one https://github.com/opensearch-project/ml-commons/blob/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/traced_small_model.zip?raw=true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool just switched it.
Increasing the heap size of a node in the test cluster to 1GB. Default is 512mb which may be too small for ml - https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/src/main/java/org/opensearch/gradle/testclusters/OpenSearchNode.java#L845 |
|
||
import com.google.common.collect.ImmutableList; | ||
|
||
public abstract class BaseNeuralSearchIT extends OpenSearchRestTestCase { | ||
public abstract class BaseNeuralSearchIT extends OpenSearchSecureRestTestCase { | ||
|
||
private static final Locale LOCALE = Locale.getDefault(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a issue for this PR. Looks like Locale.getDefault() seem to have issue with Windows shall we use Locale.ROOT? Take it in other PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let use Locale.ROOT.
I think you have not rebased the code, this statement has been updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will rebase
* } | ||
*/ | ||
@SneakyThrows | ||
public void testRescoreQuery() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add some function to rescore like weight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt think it was necessary given that weight will be outside of the neural query
/** | ||
* Tests bool should query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about bool query one with neural and one with BM25?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add
Adds a series of integration tests for neural query type. Adds shared functionality to base class as well as a utility class. Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
5b81d70
to
52531c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks
Adds a series of integration tests for neural query type. Adds shared functionality to base class as well as a utility class. Increase test cluster heap to 1 GB. Signed-off-by: John Mazanec <[email protected]> (cherry picked from commit e30285b)
Description
Adds a series of integration tests for neural query type. Adds shared functionality to base class as well as a utility class.
Issues Resolved
#14
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.