Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query string default field #24214

Merged
merged 5 commits into from
Apr 20, 2017
Merged

Query string default field #24214

merged 5 commits into from
Apr 20, 2017

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Apr 20, 2017

Currently any query_string query that use a wildcard field with no matching field is rewritten with the _all field.

For instance:

#creating test doc
PUT testing/t/1
{
  "test": {
    "field_one": "hello",
    "field_two": "world"
  }
}
#searching abc.* (does not exist) -> hit
GET testing/t/_search
{
  "query": {
    "query_string": {
      "fields": [
        "abc.*"
      ],
      "query": "hello"
    }
  }
}

This bug first appeared in 5.0 after the query refactoring and impacts only users that use _all as default field.
Indices created in 6.x will not have this problem since _all is deactivated in this version.

This change fixes this problem by ignoring terms without an explicit field if the requested multi fields are not empty.

jimczi added 2 commits April 20, 2017 15:50
Currently any `query_string` query that use a wildcard field with no matching field is rewritten with the `_all` field.

For instance:
````
#creating test doc
PUT testing/t/1
{
  "test": {
    "field_one": "hello",
    "field_two": "world"
  }
}
#searching abc.* (does not exist) -> hit
GET testing/t/_search
{
  "query": {
    "query_string": {
      "fields": [
        "abc.*"
      ],
      "query": "hello"
    }
  }
}
````

This bug first appeared in 5.0 after the query refactoring and impacts only users that use `_all` as default field.
Indices created in 6.x will not have this problem since `_all` is deactivated in this version.

This change fixes this problem by ignoring terms without an explicit field if the requested multi fields are not empty.
@jimczi jimczi added :Search/Search Search-related issues that do not fall into other categories >bug discuss review v5.5.0 v6.0.0-alpha1 labels Apr 20, 2017
@jimczi
Copy link
Contributor Author

jimczi commented Apr 20, 2017

I marked this with the discuss label because I don't know if this is the right thing to do.
Another approach would be to fail the query if the requested wildcard fields do not match any field in the mapping. This would fix queries that use a mix of explicit and non-explicit field like:

"query": "field1:foo AND bar"

…n the mapping

In the previous change each non-fielded term inside a `query_string` query was simply ignored.
This leads to weird behavior for mixed queries like `foo AND field:bar`.
This change fixes this discrepancy by returning a MatchNoDocsQuery for any term that expand to an empty list of field.
@jimczi jimczi removed the discuss label Apr 20, 2017
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -780,7 +781,8 @@ private PhraseQuery addSlopToPhrase(PhraseQuery query, int slop) {
if (field != null) {
fields = context.simpleMatchToIndexNames(field);
} else {
fields = settings.fieldsAndWeights().keySet();
fields =
settings.fieldsAndWeights() == null ? Collections.emptyList() : settings.fieldsAndWeights().keySet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe extract settings.fieldsAndWeights() to a variable to avoid this weird indentation

@jimczi
Copy link
Contributor Author

jimczi commented Apr 20, 2017

I discussed with @jpountz and we agreed that we should not simply ignore terms that expand to an empty list of field. Instead we should rewrite the specific clause to a MatchNoDocsQuery.
I pushed e1ca3fa to fix the behavior.

@jimczi jimczi added v5.4.0 and removed v5.5.0 labels Apr 20, 2017
@jimczi jimczi merged commit 525101b into elastic:master Apr 20, 2017
@jimczi jimczi deleted the query_string_default_field branch April 20, 2017 20:12
jimczi added a commit that referenced this pull request Apr 20, 2017
Currently any `query_string` query that use a wildcard field with no matching field is rewritten with the `_all` field.

For instance:
````
#creating test doc
PUT testing/t/1
{
  "test": {
    "field_one": "hello",
    "field_two": "world"
  }
}
#searching abc.* (does not exist) -> hit
GET testing/t/_search
{
  "query": {
    "query_string": {
      "fields": [
        "abc.*"
      ],
      "query": "hello"
    }
  }
}
````

This bug first appeared in 5.0 after the query refactoring and impacts only users that use `_all` as default field.
Indices created in 6.x will not have this problem since `_all` is deactivated in this version.

This change fixes this bug by returning a MatchNoDocsQuery for any term that expand to an empty list of field.
jimczi added a commit that referenced this pull request Apr 20, 2017
Currently any `query_string` query that use a wildcard field with no matching field is rewritten with the `_all` field.

For instance:
````
#creating test doc
PUT testing/t/1
{
  "test": {
    "field_one": "hello",
    "field_two": "world"
  }
}
#searching abc.* (does not exist) -> hit
GET testing/t/_search
{
  "query": {
    "query_string": {
      "fields": [
        "abc.*"
      ],
      "query": "hello"
    }
  }
}
````

This bug first appeared in 5.0 after the query refactoring and impacts only users that use `_all` as default field.
Indices created in 6.x will not have this problem since `_all` is deactivated in this version.

This change fixes this bug by returning a MatchNoDocsQuery for any term that expand to an empty list of field.
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 21, 2017
* master: (61 commits)
  Build: Move plugin cli and tests to distribution tool (elastic#24220)
  Peer Recovery: remove maxUnsafeAutoIdTimestamp hand off (elastic#24243)
  Adds version 5.3.2 and backwards compatibility indices for 5.3.1
  Add utility method to parse named XContent objects with typed prefix (elastic#24240)
  MultiBucketsAggregation.Bucket should not extend Writeable (elastic#24216)
  Don't expose cleaned-up tasks as pending in PrioritizedEsThreadPoolExecutor (elastic#24237)
  Adds declareNamedObjects methods to ConstructingObjectParser (elastic#24219)
  ESIntegTestCase.indexRandom should not introduce types. (elastic#24202)
  Tests: Extend InternalStatsTests (elastic#24212)
  IndicesQueryCache should delegate the scorerSupplier method. (elastic#24209)
  Speed up parsing of large `terms` queries. (elastic#24210)
  [TEST] make sure that the random query_string query generator defines a default_field or a list of fields
  token_count type : add an option to count tokens (fix elastic#23227) (elastic#24175)
  Query string default field (elastic#24214)
  Make Aggregations an abstract class rather than an interface (elastic#24184)
  [TEST] ensure expected sequence no and version are set when index/delete engine operation has a document failure
  Extract batch executor out of cluster service (elastic#24102)
  Add 5.3.1 to bwc versions
  Added "release-state" support to plugin docs
  Added examples to cross cluster search of using cluster settings
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories v5.4.0 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants