Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn user if regular expression is used when match_pattern defaults to simple in dynamic_template mappings #95634

Closed
quux00 opened this issue Apr 27, 2023 · 3 comments · Fixed by #95750
Assignees
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@quux00
Copy link
Contributor

quux00 commented Apr 27, 2023

Description

This was flagged by @romseygeek in #95558 (comment) to be targeted for a new issue.

In dynamic_template mappings, 'simple' match types (which is the default setting) will happily accept regexes (rather than simple wildcards) without error or warning, but then they will not work as the user likely intended. We should emit a warning if the user provides a pattern that 'looks like' a regex, but haven't explicitly set match_type.


Deeper context:

The validate method in DynamicTemplate.parse tries to catch these, but it only catches the invalid pattern when match_pattern=regex. For example, see the test: https://github.com/elastic/elasticsearch/blob/main/server/src/test/java/org/elasticsearch/index/mapper/DynamicTemplateParseTests.java#L132

Current behavior is:

  "mappings": {
    "dynamic_templates": [
      {
        "mytemplate": {
          "match_pattern": "regex",
          "match": "*name",
          "unmatch": "*middle",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ]
  }

fails with:

    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping: Pattern [*name] of type [regex] is invalid. Cannot create dynamic template [mytemplate].",\

but:

      {
        "mytemplate": {
          "match": ".*name$",
          "unmatch": "^.*middle.*",
          "mapping": {
            "type": "keyword"
          }
        }
      }

(where "match_pattern": "simple" is implied) works with no problems (until you try to actually put documents in with the mapping, of course).

We could probably add a method to Regex.java. There is already this function

   public static boolean isSimpleMatchPattern(String str) {
        return str.indexOf('*') != -1;
    }

but that's not enough for this scenario.

This code indicates that the only allowed patterns for "simple" are "xxx*", "xxx", "xxx" and "xxxyyy", so we could check for things like .* and ^xxx and yyy$ and chars like [, ] and | and maybe parens and then run it through REGEX.matches() and if that does NOT fail, then it appears to be a valid actual regex, not a simple pattern and then give a warning to users.

@quux00 quux00 added >enhancement needs:triage Requires assignment of a team area label :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team labels Apr 27, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Apr 27, 2023
@quux00 quux00 changed the title Warn user if regular expression is used when match_pattern is set to simple in dynamic_template mappings Warn user if regular expression is used when match_pattern defaults to simple in dynamic_template mappings Apr 27, 2023
@quux00 quux00 self-assigned this Apr 28, 2023
@quux00
Copy link
Contributor Author

quux00 commented Apr 28, 2023

Under what conditions should we put a warning header about the pattern_match="simple" case.

I can think of two cases:

  1. the pattern contains no wildcard (*) at all (e.g., "match":"first_name". Is this likely an error or is this is a valid use of a dynamic_template match?

  2. the pattern appears to be a full regular expression.

The latter is the goal of this ticket. Should we also warn on case 1?

@quux00
Copy link
Contributor Author

quux00 commented Apr 30, 2023

Additional problem. ES seems to allow most regex "meta-characters" in field names.

Based on my testing, these characters are allowed in ES field names and can be queried.

^
$
?
+
|
[]
()
{}
\d
\D
\w
\W
\s
\S

And .* is allowed in a simple pattern match anywhere except at the start of a field name, due to dotted field names, such as "user.name" - the simple pattern for it would "user.*".

So the only thing we can test for that is definitely not allowed is a match pattern that starts with .*. All the rest appear to be valid in ES field names.

Example:

PUT my-index-000001
{
  "mappings": {
    "dynamic_templates": [
      {
        "mytemplate": {
          "match_pattern": "simple",
          "match": "*[foo]",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ]
  }
}


GET my-index-000001/_mapping

PUT my-index-000001/_doc/1
{
  "employee": "fred",
  "my[foo]": "mymetadata goes here"
}

GET my-index-000001/_search 
{
  "query": {
    "match": {
      "my[foo]": "mymetadata goes here"
    }
  }
}

// the _search query above works:
{
  "took": 4,
  ...
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my-index-000001",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "employee": "fred",
          "my[foo]": "mymetadata goes here"
        }
      }
    ]
  }
}

quux00 added a commit to quux00/elasticsearch that referenced this issue May 3, 2023
…o simple in dynamic_template mappings

In a dynamic_templates setting, if match_phrase=simple (explicitly or by default) and the pattern
passed into match, unmatch, path_match or path_unmatch appears to be a regular expression instead
a warning is added to the HTTP response header.

Closes elastic#95634
quux00 added a commit that referenced this issue May 4, 2023
…o simple in dynamic_template mappings (#95750)

* Warn user if regular expression is used when match_pattern defaults to simple in dynamic_template mappings

If a user does not explicitly set match_pattern in a dynamic template mapping, it defaults to 'simple' wildcards, but the user might supply a regex rather than simple glob-style wildcard for one of the "match" fields. In that case, the mapping will be saved without error or warning, but at indexing time the mappings will not work as the user intended and it may not be clear why.

This change now sets a warning in that context - when match_pattern was not defined and the user appears to be specifying a regex, we provide an HTTP header warning for all the regex patterns we detect. Because ES fields allow most of the standard regex meta-characters (such as |, ^, $ and paired braces and brackets), our warning may be unwarranted, so the user can make the warning go away by setting match_pattern=simple in the dynamic template.

Closes #95634
@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
3 participants