Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Term Vectors doesn't work on artificial docs with keyword fields #53494

Closed
matriv opened this issue Mar 12, 2020 · 2 comments · Fixed by #53504
Closed

Term Vectors doesn't work on artificial docs with keyword fields #53494

matriv opened this issue Mar 12, 2020 · 2 comments · Fixed by #53504
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@matriv
Copy link
Contributor

matriv commented Mar 12, 2020

Steps to reproduce:

PUT test_termvector
{
    "mappings": {
    "properties": {
      "words": {
        "type": "keyword"
      }
    }
  }
}

PUT test_termvector/_doc/1?refresh
{
  "words": [
    "b",
    "b",
    "b",
    "a",
    "e",
    "f",
    "f"
  ]
}

# we can get terms score without artifical documents
GET test_termvector/_termvectors/1
{
  "fields": [
    "words"
  ],
  "term_statistics": false,
  "field_statistics": false,
  "positions": false,
  "offsets": false,
  "filter": {
    "max_num_terms": 3
  }
}

{
  "_index" : "test_termvector",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 1,
  "found" : true,
  "took" : 0,
  "term_vectors" : {
    "words" : {
      "terms" : {
        "b" : {
          "term_freq" : 3,
          "score" : 3.0
        },
        "e" : {
          "term_freq" : 1,
          "score" : 1.0
        },
        "f" : {
          "term_freq" : 2,
          "score" : 2.0
        }
      }
    }
  }
}


# we **cannot** get terms score with artifical documents of keyword type
GET test_termvector/_termvectors
{
  "doc": {
    "words": [
      "b",
      "b",
      "b",
      "a",
      "e",
      "f",
      "f"
    ]
  },
  "fields": [
    "words"
  ],
  "term_statistics": false,
  "field_statistics": false,
  "positions": false,
  "offsets": false,
  "filter": {
    "max_num_terms": 3
  }
}

{
  "_index" : "test_termvector",
  "_type" : "_doc",
  "_version" : 0,
  "found" : true,
  "took" : 0,
  "term_vectors" : { }
}

Issue is spotted in ParseContext#getValues() where field.stringValue() returns null for keyword fields. Need to check for the KeywordFieldType and convert BytesRef to UTF8 string.

@matriv matriv added >bug :Search/Search Search-related issues that do not fall into other categories labels Mar 12, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (:Search/Search)

matriv added a commit to matriv/elasticsearch that referenced this issue Mar 12, 2020
Previously, Term Vectors API was returning empty results for
artificial documents with keyword fields. Checking only for `string()`
on `IndexableField` is not enough, since for `KeywordFieldType`
`binaryValue()` must be used instead.

Fixes elastic#53494
matriv added a commit that referenced this issue Mar 13, 2020
Previously, Term Vectors API was returning empty results for
artificial documents with keyword fields. Checking only for `string()`
on `IndexableField` is not enough, since for `KeywordFieldType`
`binaryValue()` must be used instead.

Fixes #53494
matriv added a commit to matriv/elasticsearch that referenced this issue Mar 13, 2020
Previously, Term Vectors API was returning empty results for
artificial documents with keyword fields. Checking only for `string()`
on `IndexableField` is not enough, since for `KeywordFieldType`
`binaryValue()` must be used instead.

Fixes elastic#53494

(cherry picked from commit 1fc3fe3)
matriv added a commit that referenced this issue Mar 13, 2020
…3550)

Previously, Term Vectors API was returning empty results for
artificial documents with keyword fields. Checking only for `string()`
on `IndexableField` is not enough, since for `KeywordFieldType`
`binaryValue()` must be used instead.

Fixes #53494

(cherry picked from commit 1fc3fe3)
matriv added a commit that referenced this issue Mar 13, 2020
…3551)

Previously, Term Vectors API was returning empty results for
artificial documents with keyword fields. Checking only for `string()`
on `IndexableField` is not enough, since for `KeywordFieldType`
`binaryValue()` must be used instead.

Fixes #53494

(cherry picked from commit 1fc3fe3)
@matriv
Copy link
Contributor Author

matriv commented Mar 13, 2020

master : 1fc3fe3
7.x : b6c94fd
7.6 : 9cf063e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants