Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple date formats in date_histogram aggregation #39916

Closed
Mpdreamz opened this issue Mar 11, 2019 · 7 comments · Fixed by #40100
Closed

Multiple date formats in date_histogram aggregation #39916

Mpdreamz opened this issue Mar 11, 2019 · 7 comments · Fixed by #40100

Comments

@Mpdreamz
Copy link
Member

The following worked in 6.x

"date_histogram": {
    "extended_bounds": {
        "max": "2016-06-06T12:01:02.123",
        "min": "2014-06-06T12:01:02.123"
    },
    "field": "startedOn",
    "format": "yyyy-MM-dd'T'HH:mm:ss||date_optional_time",
    "interval": "month",
    "min_doc_count": 2,
    "missing": "2015-06-06T12:01:02.123",
    "order": {
         "_count": "asc"
    }
}

In 7.0.0-beta1 it fails with the following:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parse_exception",
        "reason" : "failed to parse date field [2014-06-06T12:01:02.123] with format [yyyy-MM-dd'T'HH:mm:ss||date_optional_time]: [Text '2014-06-06T12:01:02.123' could not be parsed, unparsed text found at index 19]",
        "stack_trace" : "ElasticsearchParseException[failed to parse date field [2014-06-06T12:01:02.123] with format [yyyy-MM-dd'T'HH:mm:ss||date_optional_time]: [Text '2014-06-06T12:01:02.123' could not be parsed, unparsed text found at index 19]]; nested: DateTimeParseException[Text '2014-06-06T12:01:02.123' could not be parsed, unparsed text found at index 19];\r\n\tat org.elasticsearch.common.time.JavaDateMathParser.parseDateTime(JavaDateMathParser.java:233)\r\n\tat org.elasticsearch.common.time.JavaDateMathParser.parse(JavaDateMathParser.java:75)\r\n\tat org.elasticsearch.search.DocValueFormat$DateTime.parseLong(DocValueFormat.java:229)\r\n\tat org.elasticsearch.search.aggregations.bucket.histogram.ExtendedBounds.parseAndValidate(ExtendedBounds.java:156)\r\n\tat org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramAggregationBuilder.innerBuild(DateHistogramAggregationBuilder.java:455)\r\n\tat org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.doBuild(ValuesSourceAggregationBuilder.java:315)\r\n\tat org.elasticsearch.search.aggregations.support.ValuesSourceAggregationBuilder.doBuild(ValuesSourceAggregationBuilder.java:39)\r\n\tat org.elasticsearch.search.aggregations.AbstractAggregationBuilder.build(AbstractAggregationBuilder.java:139)\r\n\tat org.elasticsearch.search.aggregations.AggregatorFactories$Builder.build(AggregatorFactories.java:333)\r\n\tat org.elasticsearch.search.SearchService.parseSource(SearchService.java:800)\r\n\tat org.elasticsearch.search.SearchService.createContext(SearchService.java:607)\r\n\tat org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:582)\r\n\tat org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:385)\r\n\tat org.elasticsearch.search.SearchService.access$100(SearchService.java:123)\r\n\tat org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:357)\r\n\tat org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:353)\r\n\tat org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1068)\r\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\r\n\tat org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)\r\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)\r\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\r\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\r\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\r\n\tat java.base/java.lang.Thread.run(Thread.java:834)\r\nCaused by: java.time.format.DateTimeParseException: Text '2014-06-06T12:01:02.123' could not be parsed, unparsed text found at index 19\r\n\tat java.base/java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:2049)\r\n\tat java.base/java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1874)\r\n\tat org.elasticsearch.common.time.JavaDateMathParser.parseDateTime(JavaDateMathParser.java:223)\r\n\t... 23 more\r\n"
      }
    ],

Specifying only

    "format": "date_optional_time",

Results in no error however.

@jimczi jimczi added the :Analytics/Aggregations Aggregations label Mar 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

@polyfractal
Copy link
Contributor

This appears to be a general problem with multiple formats, likely introduced/broken in #37222

We're using appendOptional() to append multiple patterns, but I don't think that's the correct method. My understanding is that appendOptional is for truly optional components in the chain. E.g. 2016-06-06 is mandatory, but 12:01:02.123 could be optional. Not for entirely different patterns.

I traced through the code and this does indeed appear to be the case. When DateTimeFormatterBuilder#parse() starts matching the input text, it matches the first pattern (yyyy-MM-dd'T'HH:mm:ss) up to position 19, which is the milliseconds portion. Then it tries the next pattern (date_optional_time) starting at position 19, which does not match so it aborts with "unfinished" characters that were not parsed.

@pgomulka @rjernst any ideas? This is hitting the edge of my java-time knowledge. I skimmed the API and I'm thinking we may need to drop back to an array of parsers, and check them individually with parseUnresolved() similar to this: https://stackoverflow.com/a/44625276?

@polyfractal
Copy link
Contributor

polyfractal commented Mar 14, 2019

It seems this affects anything using multiple patterns, most notably it breaks indexing:

PUT test
{
  "mappings": {
    "properties": {
      "date": {
        "type":   "date",
        "format": "yyyy-MM-dd'T'HH:mm:ss||date_optional_time"
      }
    }
  }
}

POST /test/_doc/
{
  "date": "2014-06-06T12:01:02.123"
}
{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [date] of type [date] in document with id 'wcHYfWkBi3kUMDzPydwQ'"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "failed to parse field [date] of type [date] in document with id 'wcHYfWkBi3kUMDzPydwQ'",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "failed to parse date field [2014-06-06T12:01:02.123] with format [yyyy-MM-dd'T'HH:mm:ss||date_optional_time]",
      "caused_by": {
        "type": "date_time_parse_exception",
        "reason": "Text '2014-06-06T12:01:02.123' could not be parsed, unparsed text found at index 19"
      }
    }
  },
  "status": 400
}

("2014-06-06T12:01:02.123" indexes correctly if just date_optional_time is specified)

EDIT:

Doesn't appear as bad as I thought. yyyy-MM-dd'T'HH:mm:ss||epoch_millis and similar patterns seem to work fine. But yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS for example fails, so there's something there about partially-matched patterns messing up the next pattern.

@pgomulka
Copy link
Contributor

pgomulka commented Mar 15, 2019

I think @polyfractal suggestion would work best.
I noticed that these would work (use it in JavaJodaTimeDuellingTests)

assertSameDate("2014-06-06T12:01:02.123", "strictDateTimeNoMillis||yyyy-MM-dd'T'HH:mm:ss.SSS")
 assertSameDate("2014-06-06T12:01:02.123", "yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS");

Couldn't find this in javadocs, but looks like DateTimeFormatter's patterns with optional depend on ordering. Someone raised a JDK bug, but it was closed as a non issue with comment:

The optional formats should be supplied longest to shortest, especially if the shorter format would match a prefix of the longer format. 

https://bugs.openjdk.java.net/browse/JDK-8188771

I don't think this is ideal, but we can workaround this with an approach from stackoverflow comment linked by @polyfractal

pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Mar 15, 2019
Java-time fails parsing composite patterns when first pattern matches
only the prefix of the input. It expects pattern in longest-shortest
order.
Joda does not suffer from this

closes elastic#39916
@pgomulka pgomulka added the :Core/Infra/Core Core issues without another label label Mar 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@pgomulka
Copy link
Contributor

the alternative to the PR above is just to state in a documentation that patterns have to be in order (longest to shortest / most strict/least strict)

@danielmitterdorfer
Copy link
Member

the alternative to the PR above is just to state in a documentation that patterns have to be in order (longest to shortest / most strict/least strict)

You're right but I think that's too trappy.

pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Mar 25, 2019
Java-time fails parsing composite patterns when first pattern matches
only the prefix of the input. It expects pattern in longest-shortest
order.
Joda does not suffer from this

closes elastic#39916
pgomulka added a commit that referenced this issue Mar 27, 2019
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
 
closes #39916
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Mar 27, 2019
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
 
closes elastic#39916
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Mar 27, 2019
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
 
closes elastic#39916
pgomulka added a commit to pgomulka/elasticsearch that referenced this issue Mar 27, 2019
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.

closes elastic#39916
pgomulka added a commit that referenced this issue Mar 27, 2019
) (#40501)

Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
 
closes #39916
backport #40100
pgomulka added a commit that referenced this issue Mar 27, 2019
) (#40502)

Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
 
closes #39916
backport #40100
@jakelandis jakelandis removed the v7.0.0 label Apr 3, 2019
pgomulka added a commit that referenced this issue Apr 4, 2019
) (#40503)

Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.

closes #39916
backport #40100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants