Skip to content

Commit

Permalink
Merge branch 'reRankByField-analysis' of https://github.com/brianf-aw…
Browse files Browse the repository at this point in the history
…s/neural-search into reRankByField-analysis
  • Loading branch information
brianf-aws committed Oct 14, 2024
2 parents 620d245 + 3e9e279 commit 7cbcec9
Show file tree
Hide file tree
Showing 12 changed files with 160 additions and 7 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
### Enhancements
- Set neural-search plugin 3.0.0 baseline JDK version to JDK-2 ([#838](https://github.com/opensearch-project/neural-search/pull/838))
### Bug Fixes
- Fix for nested field missing sub embedding field in text embedding processor ([#913](https://github.com/opensearch-project/neural-search/pull/913))
### Infrastructure
### Documentation
### Maintenance
Expand Down
48 changes: 47 additions & 1 deletion DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@
- [Run Single-node Cluster Locally](#run-single-node-cluster-locally)
- [Run Multi-node Cluster Locally](#run-multi-node-cluster-locally)
- [Debugging](#debugging)
- [Major Dependencies](#major-dependencies)
- [Backwards Compatibility Testing](#backwards-compatibility-testing)
- [Adding new tests](#adding-new-tests)
- [Supported configurations](#supported-configurations)
- [Submitting Changes](#submitting-changes)
- [Building On Lucene Version Updates](#building-on-lucene-version-updates)

# Developer Guide

Expand Down Expand Up @@ -88,12 +90,22 @@ Please follow these formatting guidelines:
OpenSearch neural-search uses a [Gradle](https://docs.gradle.org/6.6.1/userguide/userguide.html) wrapper for its build.
Run `gradlew` on Unix systems.

Build OpenSearch neural-search using `gradlew build`
Build OpenSearch neural-search using `gradlew build`. This command will
also run Integration Tests and Unit Tests.

```
./gradlew build
```

## Run Unit Tests
If you want to strictly test that your unit tests are passing
you can run the following.

```
./gradlew test
```


## Run OpenSearch neural-search

### Run Single-node Cluster Locally
Expand Down Expand Up @@ -227,6 +239,12 @@ Additionally, it is possible to attach one debugger to the cluster JVM and anoth
./gradlew :integTest -Dtest.debug=1 -Dcluster.debug=1
```

#### Major Dependencies
Currently, the major dependencies that Neural Search depends on are [ML-Commons](https://github.com/opensearch-project/ml-commons) and [K-NN](https://github.com/opensearch-project/k-NN).
Make sure to check on them when you observe a failure that affects Neural Search.
See [Building on Lucene Version updates](#building-on-lucene-version-updates) as an example where K-NN caused a build failure.
Also, please note that it may take time for developers to create a fix for your current dependency issue.

## Backwards Compatibility Testing

The purpose of Backwards Compatibility Testing and different types of BWC tests are explained [here](https://github.com/opensearch-project/opensearch-plugins/blob/main/TESTING.md#backwards-compatibility-testing). The BWC tests (i.e. Restart-Upgrade, Mixed-Cluster and Rolling-Upgrade scenarios) should be added with any new feature being added to Neural Search.
Expand Down Expand Up @@ -292,3 +310,31 @@ original PR with an appropriate label `backport <backport-branch-name>` is merge
run successfully on the PR. For example, if a PR on main needs to be backported to `2.x` branch, add a label
`backport 2.x` to the PR and make sure the backport workflow runs on the PR along with other checks. Once this PR is
merged to main, the workflow will create a backport PR to the `2.x` branch.

## Building On Lucene Version Updates
There may be a Lucene version update that can affect your workflow causing errors like
`java.lang.NoClassDefFoundError: org/apache/lucene/codecs/lucene99/Lucene99Codec` or
`Provider org.opensearch.knn.index.codec.KNN910Codec.KNN910Codec could not be instantiated`. In this case
we can observe there may be an issue with a dependency with [K-NN](https://github.com/opensearch-project/k-NN).
This results in having issues with not being able to do `./gradlew run` or `./gradlew build`.

You can check this [K-NN PR](https://github.com/opensearch-project/k-NN/pull/2195) as an example of this event happening or this [Neural Search PR](https://github.com/opensearch-project/neural-search/pull/913#issuecomment-2400189329) that shows a developer going
through the same build issue.

**Follow the steps to remedy the gradle run issue.**
1. From your cloned neural search repo root directory `rm -rf build .gradle`
2. Clear the following directories from your gradle folder located in your root directory
1. `cd ~/.gradle`
2. `rm -rf caches workers wrapper daemon`
3. `cd -` switch back the previous directory (i.e. the neural search repo root directory)
3. Finally run `./gradlew run`

**Follow the steps to remedy the gradle build issue**

**PREREQ:** Make sure you have OpenSearch repo cloned locally

1. From your cloned neural search repo root directory `rm -rf build .gradle`
2. Delete the .gradle folder and .m2 folder. `rm -rf ~/.gradle ~/.m2`
3. Head over to your OpenSearch cloned repo root directory
1. `./gradlew publisToMavenLocal`
4. Finally run `./gradlew build` from the neural search repo
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
Expand Down Expand Up @@ -285,7 +286,7 @@ private void createInferenceListForMapTypeInput(Object sourceValue, List<String>
if (sourceValue instanceof Map) {
((Map<String, Object>) sourceValue).forEach((k, v) -> createInferenceListForMapTypeInput(v, texts));
} else if (sourceValue instanceof List) {
texts.addAll(((List<String>) sourceValue));
((List<String>) sourceValue).stream().filter(Objects::nonNull).forEach(texts::add);
} else {
if (sourceValue == null) return;
texts.add(sourceValue.toString());
Expand Down Expand Up @@ -419,8 +420,12 @@ private void putNLPResultToSourceMapForMapType(
for (Map.Entry<String, Object> inputNestedMapEntry : ((Map<String, Object>) sourceValue).entrySet()) {
if (sourceAndMetadataMap.get(processorKey) instanceof List) {
// build nlp output for list of nested objects
Iterator<Object> inputNestedMapValueIt = ((List<Object>) inputNestedMapEntry.getValue()).iterator();
for (Map<String, Object> nestedElement : (List<Map<String, Object>>) sourceAndMetadataMap.get(processorKey)) {
nestedElement.put(inputNestedMapEntry.getKey(), results.get(indexWrapper.index++));
// Only fill in when value is not null
if (inputNestedMapValueIt.hasNext() && inputNestedMapValueIt.next() != null) {
nestedElement.put(inputNestedMapEntry.getKey(), results.get(indexWrapper.index++));
}
}
} else {
Pair<String, Object> processedNestedKey = processNestedKey(inputNestedMapEntry);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ public void test_batchExecute_emptyInput() {
verify(clientAccessor, never()).inferenceSentences(anyString(), anyList(), any());
}

public void test_batchExecute_allFailedValidation() {
public void test_batchExecuteWithEmpty_allFailedValidation() {
final int docCount = 2;
TestInferenceProcessor processor = new TestInferenceProcessor(createMockVectorResult(), BATCH_SIZE, null);
List<IngestDocumentWrapper> wrapperList = createIngestDocumentWrappers(docCount);
Expand All @@ -79,6 +79,29 @@ public void test_batchExecute_allFailedValidation() {
assertEquals(docCount, captor.getValue().size());
for (int i = 0; i < docCount; ++i) {
assertNotNull(captor.getValue().get(i).getException());
assertEquals(
"list type field [key1] has empty string, cannot process it",
captor.getValue().get(i).getException().getMessage()
);
assertEquals(wrapperList.get(i).getIngestDocument(), captor.getValue().get(i).getIngestDocument());
}
verify(clientAccessor, never()).inferenceSentences(anyString(), anyList(), any());
}

public void test_batchExecuteWithNull_allFailedValidation() {
final int docCount = 2;
TestInferenceProcessor processor = new TestInferenceProcessor(createMockVectorResult(), BATCH_SIZE, null);
List<IngestDocumentWrapper> wrapperList = createIngestDocumentWrappers(docCount);
wrapperList.get(0).getIngestDocument().setFieldValue("key1", Arrays.asList(null, "value1"));
wrapperList.get(1).getIngestDocument().setFieldValue("key1", Arrays.asList(null, "value1"));
Consumer resultHandler = mock(Consumer.class);
processor.batchExecute(wrapperList, resultHandler);
ArgumentCaptor<List<IngestDocumentWrapper>> captor = ArgumentCaptor.forClass(List.class);
verify(resultHandler).accept(captor.capture());
assertEquals(docCount, captor.getValue().size());
for (int i = 0; i < docCount; ++i) {
assertNotNull(captor.getValue().get(i).getException());
assertEquals("list type field [key1] has null, cannot process it", captor.getValue().get(i).getException().getMessage());
assertEquals(wrapperList.get(i).getIngestDocument(), captor.getValue().get(i).getIngestDocument());
}
verify(clientAccessor, never()).inferenceSentences(anyString(), anyList(), any());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -310,5 +310,14 @@ private void ingestBatchDocumentWithBulk(String idPrefix, int docCount, Set<Inte
);
assertEquals(!failedIds.isEmpty(), map.get("errors"));
assertEquals(docCount, ((List) map.get("items")).size());

int failedDocCount = 0;
for (Object item : ((List) map.get("items"))) {
Map<String, Map<String, Object>> itemMap = (Map<String, Map<String, Object>>) item;
if (itemMap.get("index").get("error") != null) {
failedDocCount++;
}
}
assertEquals(failedIds.size(), failedDocCount);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -730,7 +730,7 @@ public void testBuildVectorOutput_withNestedList_successful() {
IngestDocument ingestDocument = createNestedListIngestDocument();
TextEmbeddingProcessor textEmbeddingProcessor = createInstanceWithNestedMapConfiguration(config);
Map<String, Object> knnMap = textEmbeddingProcessor.buildMapWithTargetKeys(ingestDocument);
List<List<Float>> modelTensorList = createMockVectorResult();
List<List<Float>> modelTensorList = createRandomOneDimensionalMockVector(2, 2, 0.0f, 1.0f);
textEmbeddingProcessor.buildNLPResult(knnMap, modelTensorList, ingestDocument.getSourceAndMetadata());
List<Map<String, Object>> nestedObj = (List<Map<String, Object>>) ingestDocument.getSourceAndMetadata().get("nestedField");
assertTrue(nestedObj.get(0).containsKey("vectorField"));
Expand All @@ -739,12 +739,27 @@ public void testBuildVectorOutput_withNestedList_successful() {
assertNotNull(nestedObj.get(1).get("vectorField"));
}

@SuppressWarnings("unchecked")
public void testBuildVectorOutput_withNestedListHasNotForEmbeddingField_successful() {
Map<String, Object> config = createNestedListConfiguration();
IngestDocument ingestDocument = createNestedListWithNotEmbeddingFieldIngestDocument();
TextEmbeddingProcessor textEmbeddingProcessor = createInstanceWithNestedMapConfiguration(config);
Map<String, Object> knnMap = textEmbeddingProcessor.buildMapWithTargetKeys(ingestDocument);
List<List<Float>> modelTensorList = createRandomOneDimensionalMockVector(1, 2, 0.0f, 1.0f);
textEmbeddingProcessor.buildNLPResult(knnMap, modelTensorList, ingestDocument.getSourceAndMetadata());
List<Map<String, Object>> nestedObj = (List<Map<String, Object>>) ingestDocument.getSourceAndMetadata().get("nestedField");
assertFalse(nestedObj.get(0).containsKey("vectorField"));
assertTrue(nestedObj.get(0).containsKey("textFieldNotForEmbedding"));
assertTrue(nestedObj.get(1).containsKey("vectorField"));
assertNotNull(nestedObj.get(1).get("vectorField"));
}

public void testBuildVectorOutput_withNestedList_Level2_successful() {
Map<String, Object> config = createNestedList2LevelConfiguration();
IngestDocument ingestDocument = create2LevelNestedListIngestDocument();
TextEmbeddingProcessor textEmbeddingProcessor = createInstanceWithNestedMapConfiguration(config);
Map<String, Object> knnMap = textEmbeddingProcessor.buildMapWithTargetKeys(ingestDocument);
List<List<Float>> modelTensorList = createMockVectorResult();
List<List<Float>> modelTensorList = createRandomOneDimensionalMockVector(2, 2, 0.0f, 1.0f);
textEmbeddingProcessor.buildNLPResult(knnMap, modelTensorList, ingestDocument.getSourceAndMetadata());
Map<String, Object> nestedLevel1 = (Map<String, Object>) ingestDocument.getSourceAndMetadata().get("nestedField");
List<Map<String, Object>> nestedObj = (List<Map<String, Object>>) nestedLevel1.get("nestedField");
Expand All @@ -754,6 +769,22 @@ public void testBuildVectorOutput_withNestedList_Level2_successful() {
assertNotNull(nestedObj.get(1).get("vectorField"));
}

@SuppressWarnings("unchecked")
public void testBuildVectorOutput_withNestedListHasNotForEmbeddingField_Level2_successful() {
Map<String, Object> config = createNestedList2LevelConfiguration();
IngestDocument ingestDocument = create2LevelNestedListWithNotEmbeddingFieldIngestDocument();
TextEmbeddingProcessor textEmbeddingProcessor = createInstanceWithNestedMapConfiguration(config);
Map<String, Object> knnMap = textEmbeddingProcessor.buildMapWithTargetKeys(ingestDocument);
List<List<Float>> modelTensorList = createRandomOneDimensionalMockVector(1, 2, 0.0f, 1.0f);
textEmbeddingProcessor.buildNLPResult(knnMap, modelTensorList, ingestDocument.getSourceAndMetadata());
Map<String, Object> nestedLevel1 = (Map<String, Object>) ingestDocument.getSourceAndMetadata().get("nestedField");
List<Map<String, Object>> nestedObj = (List<Map<String, Object>>) nestedLevel1.get("nestedField");
assertFalse(nestedObj.get(0).containsKey("vectorField"));
assertTrue(nestedObj.get(0).containsKey("textFieldNotForEmbedding"));
assertTrue(nestedObj.get(1).containsKey("vectorField"));
assertNotNull(nestedObj.get(1).get("vectorField"));
}

public void test_updateDocument_appendVectorFieldsToDocument_successful() {
Map<String, Object> config = createPlainStringConfiguration();
IngestDocument ingestDocument = createPlainIngestDocument();
Expand Down Expand Up @@ -1039,6 +1070,16 @@ private IngestDocument createNestedListIngestDocument() {
return new IngestDocument(nestedList, new HashMap<>());
}

private IngestDocument createNestedListWithNotEmbeddingFieldIngestDocument() {
HashMap<String, Object> nestedObj1 = new HashMap<>();
nestedObj1.put("textFieldNotForEmbedding", "This is a text field");
HashMap<String, Object> nestedObj2 = new HashMap<>();
nestedObj2.put("textField", "This is another text field");
HashMap<String, Object> nestedList = new HashMap<>();
nestedList.put("nestedField", Arrays.asList(nestedObj1, nestedObj2));
return new IngestDocument(nestedList, new HashMap<>());
}

private IngestDocument create2LevelNestedListIngestDocument() {
HashMap<String, Object> nestedObj1 = new HashMap<>();
nestedObj1.put("textField", "This is a text field");
Expand All @@ -1050,4 +1091,16 @@ private IngestDocument create2LevelNestedListIngestDocument() {
nestedList1.put("nestedField", nestedList);
return new IngestDocument(nestedList1, new HashMap<>());
}

private IngestDocument create2LevelNestedListWithNotEmbeddingFieldIngestDocument() {
HashMap<String, Object> nestedObj1 = new HashMap<>();
nestedObj1.put("textFieldNotForEmbedding", "This is a text field");
HashMap<String, Object> nestedObj2 = new HashMap<>();
nestedObj2.put("textField", "This is another text field");
HashMap<String, Object> nestedList = new HashMap<>();
nestedList.put("nestedField", Arrays.asList(nestedObj1, nestedObj2));
HashMap<String, Object> nestedList1 = new HashMap<>();
nestedList1.put("nestedField", nestedList);
return new IngestDocument(nestedList1, new HashMap<>());
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import static org.opensearch.knn.index.query.KNNQueryBuilder.MIN_SCORE_FIELD;
import static org.opensearch.knn.index.query.KNNQueryBuilder.RESCORE_FIELD;
import static org.opensearch.knn.index.query.KNNQueryBuilder.RESCORE_OVERSAMPLE_FIELD;
import static org.opensearch.neuralsearch.util.TestUtils.DELTA_FOR_FLOATS_ASSERTION;
import static org.opensearch.neuralsearch.util.TestUtils.xContentBuilderToMap;
import static org.opensearch.neuralsearch.query.NeuralQueryBuilder.K_FIELD;
import static org.opensearch.neuralsearch.query.NeuralQueryBuilder.MODEL_ID_FIELD;
Expand Down Expand Up @@ -183,7 +184,11 @@ public void testFromXContent_withRescoreContext_thenBuildSuccessfully() {
assertEquals(QUERY_TEXT, neuralQueryBuilder.queryText());
assertEquals(MODEL_ID, neuralQueryBuilder.modelId());
assertEquals(K, neuralQueryBuilder.k());
assertEquals(RescoreContext.getDefault(), neuralQueryBuilder.rescoreContext());
assertEquals(
RescoreContext.getDefault().getOversampleFactor(),
neuralQueryBuilder.rescoreContext().getOversampleFactor(),
DELTA_FOR_FLOATS_ASSERTION
);
assertNull(neuralQueryBuilder.methodParameters());
}

Expand Down
3 changes: 3 additions & 0 deletions src/test/resources/processor/IndexMappings.json
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@
"text": {
"type": "text"
},
"text_not_for_embedding": {
"type": "text"
},
"embedding": {
"type": "knn_vector",
"dimension": 768,
Expand Down
3 changes: 3 additions & 0 deletions src/test/resources/processor/ingest_doc1.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@
"movie": null
},
"nested_passages": [
{
"text_not_for_embedding": "test"
},
{
"text": "hello"
},
Expand Down
3 changes: 3 additions & 0 deletions src/test/resources/processor/ingest_doc2.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
"movie": null
},
"nested_passages": [
{
"text_not_for_embedding": "test"
},
{
"text": "apple"
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ protected void loadModel(final String modelId) throws Exception {
isComplete = checkComplete(taskQueryResult);
Thread.sleep(DEFAULT_TASK_RESULT_QUERY_INTERVAL_IN_MILLISECOND);
}
assertTrue(isComplete);
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ public class TestUtils {

public static final String RELATION_EQUAL_TO = "eq";
public static final float DELTA_FOR_SCORE_ASSERTION = 0.001f;
public static final float DELTA_FOR_FLOATS_ASSERTION = 0.001f;
public static final String RESTART_UPGRADE_OLD_CLUSTER = "tests.is_old_cluster";
public static final String BWC_VERSION = "tests.plugin_bwc_version";
public static final String NEURAL_SEARCH_BWC_PREFIX = "neuralsearch-bwc-";
Expand Down

0 comments on commit 7cbcec9

Please sign in to comment.