Skip to content

Commit

Permalink
feat(search): autocomplete custom configuration (datahub-project#10426)
Browse files Browse the repository at this point in the history
  • Loading branch information
david-leifker authored and sleeperdeep committed Jun 25, 2024
1 parent 176fa33 commit 4c75ac2
Show file tree
Hide file tree
Showing 19 changed files with 869 additions and 165 deletions.
4 changes: 4 additions & 0 deletions docker/profiles/docker-compose.gms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,8 @@ x-datahub-gms-service: &datahub-gms-service
- ${DATAHUB_LOCAL_GMS_ENV:-empty.env}
environment: &datahub-gms-env
<<: [*primary-datastore-mysql-env, *graph-datastore-search-env, *search-datastore-env, *datahub-quickstart-telemetry-env, *kafka-env]
ELASTICSEARCH_QUERY_CUSTOM_CONFIG_ENABLED: true
ELASTICSEARCH_QUERY_CUSTOM_CONFIG_FILE: '/etc/datahub/search/search_config.yaml'
healthcheck:
test: curl -sS --fail http://datahub-gms:${DATAHUB_GMS_PORT:-8080}/health
start_period: 90s
Expand All @@ -107,6 +109,7 @@ x-datahub-gms-service: &datahub-gms-service
timeout: 5s
volumes:
- ${HOME}/.datahub/plugins:/etc/datahub/plugins
- ${HOME}/.datahub/search:/etc/datahub/search
labels:
io.datahubproject.datahub.component: "gms"

Expand All @@ -131,6 +134,7 @@ x-datahub-gms-service-dev: &datahub-gms-service-dev
- ../../metadata-models/src/main/resources/:/datahub/datahub-gms/resources
- ../../metadata-service/war/build/libs/:/datahub/datahub-gms/bin
- ${HOME}/.datahub/plugins:/etc/datahub/plugins
- ${HOME}/.datahub/search:/etc/datahub/search

#################################
# MAE Consumer
Expand Down
1 change: 0 additions & 1 deletion docker/profiles/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---
version: '3.9'
name: datahub

include:
Expand Down
75 changes: 71 additions & 4 deletions docs/how/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -291,11 +291,11 @@ If enabled in #2 above, those queries will
appear in the `should` section of the `boolean query`[[4](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-bool-query.html)].
4. `functionScore` - The Elasticsearch `function score`[[5](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-function-score-query.html#score-functions)] section of the overall query.

### Examples
#### Examples

These examples assume a match-all `queryRegex` of `.*` so that it would impact any search query for simplicity.

#### Example 1: Ranking By Tags/Terms
##### Example 1: Ranking By Tags/Terms

Boost entities with tags of `primary` or `gold` and an example glossary term's uuid.

Expand Down Expand Up @@ -327,7 +327,7 @@ queryConfigurations:
boost_mode: multiply
```
#### Example 2: Preferred Data Platform
##### Example 2: Preferred Data Platform
Boost the `urn:li:dataPlatform:hive` platform.

Expand All @@ -350,7 +350,7 @@ queryConfigurations:
boost_mode: multiply
```

#### Example 3: Exclusion & Bury
##### Example 3: Exclusion & Bury

This configuration extends the 3 built-in queries with a rule to exclude `deprecated` entities from search results
because they are not generally relevant as well as reduces the score of `materialized`.
Expand Down Expand Up @@ -380,6 +380,73 @@ queryConfigurations:
boost_mode: multiply
```

### Search Autocomplete Configuration

Similar to the options provided in the previous section for search configuration, there are autocomplete specific options
which can be configured.

Note: The scoring functions defined in the previous section are inherited for autocomplete by default, unless
overrides are provided in the autocomplete section.

For the most part the configuration options are identical to the search customization options in the previous
section, however they are located under `autocompleteConfigurations` in the yaml configuration file.

1. `queryRegex` - Responsible for selecting the search customization based on the [regex matching](https://www.w3schools.com/java/java_regex.asp) the search query string.
*The first match is applied.*
2. The following boolean enables/disables the function score inheritance from the normal search configuration: [`inheritFunctionScore`]
This flag will automatically be set to `false` when the `functionScore` section is provided. If set to `false` with no
`functionScore` provided, the default Elasticsearch `_score` is used.
3. Built-in query booleans - There is 1 built-in query which can be enabled/disabled. These include
the `default autocomplete query` query,
enabled with the following booleans
respectively [`defaultQuery`]
4. `boolQuery` - The base Elasticsearch `boolean query`[[4](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-bool-query.html)].
If enabled in #2 above, those queries will
appear in the `should` section of the `boolean query`[[4](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-bool-query.html)].
5. `functionScore` - The Elasticsearch `function score`[[5](https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-function-score-query.html#score-functions)] section of the overall query.

#### Examples

These examples assume a match-all `queryRegex` of `.*` so that it would impact any search query for simplicity. Also
note that the `queryRegex` is applied individually for `searchConfigurations` and `autocompleteConfigurations` and they
do not have to be identical.

##### Example 1: Exclude `deprecated` entities from autocomplete

```yaml
autocompleteConfigurations:
- queryRegex: .*
defaultQuery: true
boolQuery:
must:
- term:
deprecated: 'false'
```

#### Example 2: Override scoring for autocomplete

```yaml
autocompleteConfigurations:
- queryRegex: .*
defaultQuery: true
functionScore:
functions:
- filter:
term:
materialized:
value: true
weight: 1.1
- filter:
term:
deprecated:
value: false
weight: 0.5
score_mode: avg
boost_mode: multiply
```

## FAQ and Troubleshooting

**How are the results ordered?**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,9 @@ public AutoCompleteResult autoComplete(
IndexConvention indexConvention = opContext.getSearchContext().getIndexConvention();
AutocompleteRequestHandler builder =
AutocompleteRequestHandler.getBuilder(
entitySpec, opContext.getRetrieverContext().get().getAspectRetriever());
entitySpec,
customSearchConfiguration,
opContext.getRetrieverContext().get().getAspectRetriever());
SearchRequest req =
builder.getSearchRequest(
opContext,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,14 @@
import static com.linkedin.metadata.search.utils.ESAccessControlUtil.restrictUrn;
import static com.linkedin.metadata.search.utils.ESUtils.applyDefaultSearchFilters;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.common.collect.ImmutableList;
import com.linkedin.common.urn.Urn;
import com.linkedin.data.template.StringArray;
import com.linkedin.metadata.aspect.AspectRetriever;
import com.linkedin.metadata.config.search.custom.AutocompleteConfiguration;
import com.linkedin.metadata.config.search.custom.CustomSearchConfiguration;
import com.linkedin.metadata.config.search.custom.QueryConfiguration;
import com.linkedin.metadata.models.EntitySpec;
import com.linkedin.metadata.models.SearchableFieldSpec;
import com.linkedin.metadata.models.annotation.SearchableAnnotation;
Expand All @@ -18,9 +22,9 @@
import com.linkedin.metadata.search.utils.ESUtils;
import io.datahubproject.metadata.context.OperationContext;
import java.net.URISyntaxException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Optional;
Expand All @@ -35,7 +39,9 @@
import org.opensearch.action.search.SearchResponse;
import org.opensearch.index.query.BoolQueryBuilder;
import org.opensearch.index.query.MultiMatchQueryBuilder;
import org.opensearch.index.query.QueryBuilder;
import org.opensearch.index.query.QueryBuilders;
import org.opensearch.index.query.functionscore.FunctionScoreQueryBuilder;
import org.opensearch.search.SearchHit;
import org.opensearch.search.builder.SearchSourceBuilder;
import org.opensearch.search.fetch.subphase.highlight.HighlightBuilder;
Expand All @@ -51,9 +57,17 @@ public class AutocompleteRequestHandler {

private final AspectRetriever aspectRetriever;

private final CustomizedQueryHandler customizedQueryHandler;

private final EntitySpec entitySpec;

public AutocompleteRequestHandler(
@Nonnull EntitySpec entitySpec, @Nonnull AspectRetriever aspectRetriever) {
@Nonnull EntitySpec entitySpec,
@Nullable CustomSearchConfiguration customSearchConfiguration,
@Nonnull AspectRetriever aspectRetriever) {
this.entitySpec = entitySpec;
List<SearchableFieldSpec> fieldSpecs = entitySpec.getSearchableFieldSpecs();
this.customizedQueryHandler = CustomizedQueryHandler.builder(customSearchConfiguration).build();
_defaultAutocompleteFields =
Stream.concat(
fieldSpecs.stream()
Expand All @@ -80,9 +94,13 @@ public AutocompleteRequestHandler(
}

public static AutocompleteRequestHandler getBuilder(
@Nonnull EntitySpec entitySpec, @Nonnull AspectRetriever aspectRetriever) {
@Nonnull EntitySpec entitySpec,
@Nullable CustomSearchConfiguration customSearchConfiguration,
@Nonnull AspectRetriever aspectRetriever) {
return AUTOCOMPLETE_QUERY_BUILDER_BY_ENTITY_NAME.computeIfAbsent(
entitySpec, k -> new AutocompleteRequestHandler(entitySpec, aspectRetriever));
entitySpec,
k ->
new AutocompleteRequestHandler(entitySpec, customSearchConfiguration, aspectRetriever));
}

public SearchRequest getSearchRequest(
Expand All @@ -94,24 +112,90 @@ public SearchRequest getSearchRequest(
SearchRequest searchRequest = new SearchRequest();
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(limit);
// apply default filters
BoolQueryBuilder boolQueryBuilder =
applyDefaultSearchFilters(opContext, filter, getQuery(input, field));

searchSourceBuilder.query(boolQueryBuilder);
searchSourceBuilder.postFilter(
ESUtils.buildFilterQuery(filter, false, searchableFieldTypes, aspectRetriever));
AutocompleteConfiguration customAutocompleteConfig =
customizedQueryHandler.lookupAutocompleteConfig(input).orElse(null);
QueryConfiguration customQueryConfig =
customizedQueryHandler.lookupQueryConfig(input).orElse(null);

// Initial query with input filters
BoolQueryBuilder baseQuery =
ESUtils.buildFilterQuery(filter, false, searchableFieldTypes, aspectRetriever);

// Add autocomplete query
baseQuery.should(getQuery(opContext.getObjectMapper(), customAutocompleteConfig, input, field));

// Apply default filters
BoolQueryBuilder queryWithDefaultFilters =
applyDefaultSearchFilters(opContext, filter, baseQuery);

// Apply scoring
FunctionScoreQueryBuilder functionScoreQueryBuilder =
Optional.ofNullable(customAutocompleteConfig)
.flatMap(
cac ->
CustomizedQueryHandler.functionScoreQueryBuilder(
opContext.getObjectMapper(),
cac,
queryWithDefaultFilters,
customQueryConfig))
.orElse(
SearchQueryBuilder.buildScoreFunctions(
opContext, customQueryConfig, List.of(entitySpec), queryWithDefaultFilters));
searchSourceBuilder.query(functionScoreQueryBuilder);

ESUtils.buildSortOrder(searchSourceBuilder, null, List.of(entitySpec));

// wire inner non-scored query
searchSourceBuilder.highlighter(getHighlights(field));
searchRequest.source(searchSourceBuilder);
return searchRequest;
}

private BoolQueryBuilder getQuery(@Nonnull String query, @Nullable String field) {
return getQuery(getAutocompleteFields(field), query);
private BoolQueryBuilder getQuery(
@Nonnull ObjectMapper objectMapper,
@Nullable AutocompleteConfiguration customAutocompleteConfig,
@Nonnull String query,
@Nullable String field) {
return getQuery(objectMapper, customAutocompleteConfig, getAutocompleteFields(field), query);
}

public BoolQueryBuilder getQuery(
@Nonnull ObjectMapper objectMapper,
@Nullable AutocompleteConfiguration customAutocompleteConfig,
List<String> autocompleteFields,
@Nonnull String query) {

BoolQueryBuilder finalQuery =
Optional.ofNullable(customAutocompleteConfig)
.flatMap(cac -> CustomizedQueryHandler.boolQueryBuilder(objectMapper, cac, query))
.orElse(QueryBuilders.boolQuery())
.minimumShouldMatch(1);

getAutocompleteQuery(customAutocompleteConfig, autocompleteFields, query)
.ifPresent(finalQuery::should);

return finalQuery;
}

private Optional<QueryBuilder> getAutocompleteQuery(
@Nullable AutocompleteConfiguration customConfig,
List<String> autocompleteFields,
@Nonnull String query) {
Optional<QueryBuilder> result = Optional.empty();

if (customConfig == null || customConfig.isDefaultQuery()) {
result = Optional.of(defaultQuery(autocompleteFields, query));
}

return result;
}

public static BoolQueryBuilder getQuery(List<String> autocompleteFields, @Nonnull String query) {
private static BoolQueryBuilder defaultQuery(
List<String> autocompleteFields, @Nonnull String query) {
BoolQueryBuilder finalQuery = QueryBuilders.boolQuery();
finalQuery.minimumShouldMatch(1);

// Search for exact matches with higher boost and ngram matches
MultiMatchQueryBuilder autocompleteQueryBuilder =
QueryBuilders.multiMatchQuery(query).type(MultiMatchQueryBuilder.Type.BOOL_PREFIX);
Expand Down Expand Up @@ -154,6 +238,12 @@ private HighlightBuilder getHighlights(@Nullable String field) {
.field(fieldName + ".*")
.field(fieldName + ".ngram")
.field(fieldName + ".delimited"));

// set field match req false for ngram
highlightBuilder.fields().stream()
.filter(f -> f.name().contains("ngram"))
.forEach(f -> f.requireFieldMatch(false).noMatchSize(200));

return highlightBuilder;
}

Expand All @@ -168,8 +258,9 @@ public AutoCompleteResult extractResult(
@Nonnull OperationContext opContext,
@Nonnull SearchResponse searchResponse,
@Nonnull String input) {
Set<String> results = new LinkedHashSet<>();
Set<AutoCompleteEntity> entityResults = new HashSet<>();
// use lists to preserve ranking
List<String> results = new ArrayList<>();
List<AutoCompleteEntity> entityResults = new ArrayList<>();

for (SearchHit hit : searchResponse.getHits()) {
Optional<String> matchedFieldValue =
Expand All @@ -181,13 +272,15 @@ public AutoCompleteResult extractResult(
if (matchedUrn.isPresent()) {
Urn autoCompleteUrn = Urn.createFromString(matchedUrn.get());
if (!restrictUrn(opContext, autoCompleteUrn)) {
entityResults.add(
new AutoCompleteEntity().setUrn(Urn.createFromString(matchedUrn.get())));
matchedFieldValue.ifPresent(results::add);
matchedFieldValue.ifPresent(
value -> {
entityResults.add(new AutoCompleteEntity().setUrn(autoCompleteUrn));
results.add(value);
});
}
}
} catch (URISyntaxException e) {
throw new RuntimeException(String.format("Failed to create urn %s", matchedUrn.get()), e);
log.warn(String.format("Failed to create urn %s", matchedUrn.get()));
}
}
return new AutoCompleteResult()
Expand Down
Loading

0 comments on commit 4c75ac2

Please sign in to comment.