Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting on a keyword field not working as expected #82694

Closed
youngmi-scibite opened this issue Jan 17, 2022 · 6 comments
Closed

Highlighting on a keyword field not working as expected #82694

youngmi-scibite opened this issue Jan 17, 2022 · 6 comments
Labels
>bug :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@youngmi-scibite
Copy link

youngmi-scibite commented Jan 17, 2022

Elasticsearch version (bin/elasticsearch --version): 7.16.3

Plugins installed: [ mapper-annotated-text ]

JVM version (java -version): openjdk version "17.0.1"

OS version (uname -a if on a Unix-like system): Docker elasticsearch:7.16.3

Description of the problem including expected versus actual behavior:
I have a keyword type field with the lowercase normaliser. When I get highlight on the keyword field, Highlighting seems not to be applied on the _source value even though I explicitly set force_source = true.

Steps to reproduce:

  1. index creation and mappings
PUT my-index-000001
{
   "mappings": {
       "properties": {
           "tags": {
               "type": "keyword",
               "normalizer": "lowercase"
           },
           "keywords": {
               "type": "keyword"
           },
           "description": {
               "type": "text"
           }
       }
   }
}

POST my-index-000001/_doc
{
 "tags": "Elastic",
 "keywords": "Elastic",
 "description": "Elastic search elastic"
}
  1. Send a search request
GET my-index-000001/_search
{
    "from": 0,
    "size": 20,
    "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "elastic",
                        "fields": [
                            "keywords",
                            "tags",
                            "description"
                        ]
                    }
                }
            ]
        }
    },
    "highlight": {
        "number_of_fragments": 0,
        "force_source" : true,
        "fields": {
            "keywords": {},
            "tags": {},
            "description": {}
        }
    }
}

This is the response

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.39556286,
        "hits": [
            {
                "_index": "my-index-000001",
                "_type": "_doc",
                "_id": "Ujt6aX4BXCJVOiKe7UXr",
                "_score": 0.39556286,
                "_source": {
                    "tags": "Elastic",
                    "keywords": "Elastic",
                    "description": "Elastic search elastic"
                },
                "highlight": {
                    "description": [
                        "<em>Elastic</em> search <em>elastic</em>"
                    ],
                    "tags": [
                        "<em>elastic</em>"
                    ]
                }
            }
        ]
    }
}

What I expected for the highlight value on "tags" is <em>Elastic</em> because I set "force_source" : true in the request and the tags in _source is Elastic.

I have tested different versions
7.8.1, 7.9.0, 7.10.1 return <em>Elastic</em>
7.11.2, 7.12.1, 7.13.4, 7.14.2, 7.16.1 return <em>elastic</em>

Provide logs (if relevant):

@youngmi-scibite youngmi-scibite added >bug needs:triage Requires assignment of a team area label labels Jan 17, 2022
@romseygeek romseygeek added the :Search Relevance/Highlighting How a query matched a document label Jan 18, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jan 18, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@romseygeek romseygeek removed Team:Search Meta label for search team needs:triage Requires assignment of a team area label labels Jan 18, 2022
@javanna javanna added the Team:Search Meta label for search team label Jan 18, 2022
@youngmi-scibite youngmi-scibite changed the title 'force_source' seems not working Highlighting on a keyword field not working as expected Jan 18, 2022
@jtibshirani
Copy link
Contributor

@youngmi-scibite thank you for the report! We intentionally changed this behavior in 7.11 (#63572) because we thought it was most consistent to always normalize keyword fields: this is the way they're stored in the index, and the way they appear in other APIs (terms aggregations, fields for field retrieval, etc.) I think the force_source option is not intended as a way to guarantee that we return the original _source exactly. Instead it is an expert, low-level option that indicates where Elasticsearch should load the raw field data to pass to highlighting.

For context, is this causing a problem in your application? Or is it just something surprising you noticed while testing different versions? Maybe you could provide more context on the use case.

@youngmi-scibite
Copy link
Author

youngmi-scibite commented Jan 19, 2022

Thanks @jtibshirani for the reply!

For example, we index first name and surname as keyword type with lowercase normalizer. If I perform searching with an author's name e.g, John, john will be appearing as highlighted in the search result, not John. But if I find the same documents with a different query string, the author name will appear as it is John which is what I have in _source field. Search results don't look consistent. There can be many use cases which work well with this new feature but in our case it makes sense to apply the highlighting to the original context. Please let me know if you need more details!

@jtibshirani
Copy link
Contributor

I see, that makes sense. Would it be possible to store the first and last names as text fields with a lowercase tokenizer? Those will still be shown using the original _source, instead of the lowercase version.

I'll also reach out to my teammates to see if they have thoughts/ suggestions.

@youngmi-scibite
Copy link
Author

@jtibshirani Thanks for coming back with a suggestion. Unfortunately we can't make this change right now. We hope to have an option of getting highlighting done in the previous way.

@jtibshirani
Copy link
Contributor

We discussed as a team and don't plan to make a change right now. While we understand you prefer the old behavior for your set-up, we don't want to add another option to the API with a separate implementation. We can revisit this decision if we hear from more users that this is a problem. Sorry this isn't the news you were hoping for.

I'm closing the issue, but feel free to still add comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants