Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ValueFetcher when loading text snippets to highlight #63572

Merged
merged 10 commits into from
Nov 24, 2020

Conversation

romseygeek
Copy link
Contributor

@romseygeek romseygeek commented Oct 12, 2020

HighlighterUtils.loadFieldValues() loads values directly from the source, and
then callers have to deal with filtering out values that would have been removed
by an ignore_above filter on keyword fields. Instead, we can use the
ValueFetcher for the relevant field, which handles all this logic for us.

Closes #59931.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Highlighting)

@elasticmachine elasticmachine added the Team:Search Meta label for search team label Oct 12, 2020
return textsToHighlight;
MapperService mapperService,
FetchSubPhase.HitContext hitContext) throws IOException {
ValueFetcher fetcher = fieldType.valueFetcher(mapperService, null, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null looks is going to cause this to fail on runtime fields. Do we filter those out other places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to highlight this with a comment of 'not sure about this' - I'm pretty sure that we filter out non-text fields further up the stack, but I need to double-check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do try to highlight runtime fields. I'm not sure how we manage not to fail here. Runtime fields really need the SearchLookup to do anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember looking at this too, and finding that for runtime fields we simply don't return anything highlighted rather than failing, as they are not in _source nor stored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that runtime fields can't be highlighted anyway, I just added a null check for SearchLookup inside the value fetcher in AbstractScriptFieldType.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see the value fetcher abstraction simplify some special cases! One general comment -- now that we use value fetchers, we will highlight copy_to content. (For multi-fields, I think they were already handled before this change.) Perhaps we could add tests for this enhancement and make it more visible in the version 'changes' docs?

@jtibshirani
Copy link
Contributor

@romseygeek just a heads up that I marked this PR as closing #59931.

@romseygeek
Copy link
Contributor Author

We have an interesting test failure here:

org.elasticsearch.search.fetch.subphase.highlight.HighlighterSearchIT > testWithNormalizer FAILED |  
-- | --
  | java.lang.AssertionError: |  
  | Expected: "<em>Hello World</em>" |  
  | but: was "<em>hello world</em>" |  
  | at __randomizedtesting.SeedInfo.seed([6429DCC7E15891CE:7E8A44B3C85F94F]:0) |  
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) |  
  | at org.junit.Assert.assertThat(Assert.java:956)

Keyword value fetchers normalize their data, but this highlighter test is expecting normalization not to be applied to the highlighter output. I'm thinking we should change the test, as it makes more sense for the behaviour to be consistent across fields and highlights.

@@ -60,7 +60,7 @@ public SourceValueFetcher(String fieldName, QueryShardContext context, Object nu
for (String path : sourcePaths) {
Object sourceValue = lookup.extractValue(path, nullValue);
if (sourceValue == null) {
return List.of();
continue;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug - if the first of multiple source paths didn't have any values, we returned early instead of continuing to check further paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this! Maybe we could fix this in a separate PR (with a quick test) to keep this one scoped to highlighting?

@@ -198,6 +199,9 @@ protected final void checkAllowExpensiveQueries(QueryShardContext context) {

@Override
public ValueFetcher valueFetcher(QueryShardContext context, SearchLookup lookup, String format) {
if (lookup == null) {
return v -> Collections.emptyList();
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option here would be to pass the Fetch-phase search lookup through all the highlighting code to HighlightUtils.loadValues(), and I think we probably will want to do that eventually, but it's not really ready yet given our uncertainty around how that lookup is built and passed around anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask why it is null, the supplier could throw unsupported operation exception and not be null, but it is no supplier in this case. I am not liking that much that it can be null, but I am not sure how we can avoid that in this scenario.

@@ -60,7 +60,7 @@ public SourceValueFetcher(String fieldName, QueryShardContext context, Object nu
for (String path : sourcePaths) {
Object sourceValue = lookup.extractValue(path, nullValue);
if (sourceValue == null) {
return List.of();
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this! Maybe we could fix this in a separate PR (with a quick test) to keep this one scoped to highlighting?

//percolator needs to always load from source, thus it sets the global force source to true
List<Object> textsToHighlight;
if (forceSource == false && fieldType.isStored()) {
boolean storedFieldsAvailable) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it'd be best to keep the original forceSource name and value -- it matches the force_source REST option that this value comes and avoids having mixed concepts in the code.

}
assert textsToHighlight != null;
return textsToHighlight;
ValueFetcher fetcher = fieldType.valueFetcher(qsc, null, null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just opened #65292 -- if we like the direction and it gets merged, then MappedFieldType#valueFetcher won't need a dedicated SearchLookup at all. I think highlighting on runtime fields would 'just work' here (although it'd be worth a test!)

@romseygeek
Copy link
Contributor Author

I opened #65375 to fix the copy_to fields bug separately.

Copy link
Contributor

@jtibshirani jtibshirani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks great to me. Some last high-level comments:

  • We're now able to highlight runtime keyword fields. It could also be good to add a REST test for runtime keywords, as I don't think we have any tests for this yet. And I think we can remove the highlight test exclusion from xpack/plugin/runtime-fields/qa/build.gradle.
  • This PR makes some changes to the default highlighting behavior: we now include copy_to fields, and keywords are normalized. We could make a small note in the migration docs about these changes?

@romseygeek
Copy link
Contributor Author

We're now able to highlight runtime keyword fields

Unfortunately this doesn't quite work yet, because the highlighters try to re-analyze the values from source, and trying to pull an index analyzer for runtime fields will fail - @jimczi found this elsewhere. I'll try and fix this in a follow up, I think it's just a case of amending QueryShardContext.getFieldNameIndexAnalyzer()

We could make a small note in the migration docs about these changes?

++, will push a change for this

@romseygeek
Copy link
Contributor Author

++, will push a change for this

Actually this will have to wait for the backport, as there's no 7.11 migration doc in master

@romseygeek romseygeek merged commit d088171 into elastic:master Nov 24, 2020
@romseygeek romseygeek deleted the highlight/fetcher branch November 24, 2020 16:09
romseygeek added a commit to romseygeek/elasticsearch that referenced this pull request Nov 24, 2020
HighlighterUtils.loadFieldValues() loads values directly from the source, and
then callers have to deal with filtering out values that would have been removed
by an ignore_above filter on keyword fields. Instead, we can use the
ValueFetcher for the relevant field, which handles all this logic for us.

Closes elastic#59931.
@romseygeek
Copy link
Contributor Author

For 7.11 release notes: This change means that you can highlight fields populated via copy_to directives.

romseygeek added a commit that referenced this pull request Nov 25, 2020
…5441)

HighlighterUtils.loadFieldValues() loads values directly from the source, and
then callers have to deal with filtering out values that would have been removed
by an ignore_above filter on keyword fields. Instead, we can use the
ValueFetcher for the relevant field, which handles all this logic for us.

Closes #59931.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Highlighting should leverage the new field retrieval API
6 participants