Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse composite patterns using ClassicFormat.parseObject #40100

Merged
merged 24 commits into from
Mar 27, 2019
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1585,7 +1585,7 @@ static JavaDateFormatter merge(String pattern, List<DateFormatter> formatters) {
if (printer == null) {
printer = javaDateFormatter.getPrinter();
}
dateTimeFormatters.add(javaDateFormatter.getParser());
dateTimeFormatters.addAll(javaDateFormatter.getParsers());
Copy link
Contributor Author

@pgomulka pgomulka Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of this merge method. It extracts relatively low level java.time.DateTimeFormatter when it should stick with org.elasticsearch.common.time.DateFormatter abstraction as long as possible.
Possibly the roundUpBuilder should also be used inside JavaDateFormatter constructor?

also I suspect that roundUpBuilder will suffer from the same problem when it is a composite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of this merge method. It extracts relatively low level java.time.DateTimeFormatter when it should stick with org.elasticsearch.common.time.DateFormatter abstraction as long as possible.

Maybe we can refactor this in a separate PR then? Also, this code should only be there temporary? As soon as we get rid of Joda time in the code base, I expect that we can get rid of quite a few abstractions.

I suspect that roundUpBuilder will suffer from the same problem when it is a composite?

Is it intended to be used as a composite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intended to be used as a composite?

It is used as a composite in the merge method when constructing the roundUpParser with the appendOptional.
I can imagine we can have the same pattern as on the issue (yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS) but used with this parser. Will try to come up with a testcase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed that with @spinscale and there is no way we could create a pattern that would suffer from the same problem.
For index name calculations like prefix-{2010-01-01/d{yyyy-MM-dd||yyyy-MM-ddTHH}} it would fail parsing, as the only thing expected after | is the timezone (it would fail saying unexpected |)

roundupBuilder.appendOptional(javaDateFormatter.getRoundupParser());
}
DateTimeFormatter roundUpParser = roundupBuilder.toFormatter(Locale.ROOT);
Expand Down Expand Up @@ -1632,7 +1632,7 @@ public static ZonedDateTime from(TemporalAccessor accessor) {
if (zoneId == null) {
zoneId = ZoneOffset.UTC;
}

LocalDate localDate = accessor.query(TemporalQueries.localDate());
LocalTime localTime = accessor.query(TemporalQueries.localTime());
boolean isLocalDateSet = localDate != null;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@

import org.elasticsearch.common.Strings;

import java.text.ParsePosition;
import java.time.ZoneId;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeFormatterBuilder;
Expand All @@ -29,7 +30,9 @@
import java.time.temporal.TemporalAccessor;
import java.time.temporal.TemporalField;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Locale;
import java.util.Map;
import java.util.Objects;
Expand All @@ -50,16 +53,9 @@ class JavaDateFormatter implements DateFormatter {

private final String format;
private final DateTimeFormatter printer;
private final DateTimeFormatter parser;
private final List<DateTimeFormatter> parsers;
private final DateTimeFormatter roundupParser;

private JavaDateFormatter(String format, DateTimeFormatter printer, DateTimeFormatter roundupParser, DateTimeFormatter parser) {
this.format = format;
this.printer = printer;
this.roundupParser = roundupParser;
this.parser = parser;
}

JavaDateFormatter(String format, DateTimeFormatter printer, DateTimeFormatter... parsers) {
this(format, printer, builder -> ROUND_UP_BASE_FIELDS.forEach(builder::parseDefaulting), parsers);
}
Expand All @@ -79,36 +75,31 @@ private JavaDateFormatter(String format, DateTimeFormatter printer, DateTimeForm
}
this.printer = printer;
this.format = format;

if (parsers.length == 0) {
this.parser = printer;
} else if (parsers.length == 1) {
this.parser = parsers[0];
this.parsers = Arrays.asList(printer);
pgomulka marked this conversation as resolved.
Show resolved Hide resolved
} else {
DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder();
for (DateTimeFormatter parser : parsers) {
builder.appendOptional(parser);
}
this.parser = builder.toFormatter(Locale.ROOT);
this.parsers = Arrays.asList(parsers);
}

DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder();
if (format.contains("||") == false) {
builder.append(this.parser);
builder.append(firstParser());
}
roundupParserConsumer.accept(builder);
DateTimeFormatter roundupFormatter = builder.toFormatter(parser.getLocale());
DateTimeFormatter roundupFormatter = builder.toFormatter(firstParser().getLocale());
if (printer.getZone() != null) {
roundupFormatter = roundupFormatter.withZone(printer.getZone());
}
this.roundupParser = roundupFormatter;
}

DateTimeFormatter getRoundupParser() {
return roundupParser;
private DateTimeFormatter firstParser() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal opinion: I dislike this method somehow. There are alternatives whenever this method is called (using zone() or locale() directly, and calling this.parsers.get(0) directly for the rest.

Copy link
Contributor Author

@pgomulka pgomulka Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my intention was to avoid having repetitive calls of this.parsers.get(0).
But I see your point that we could use alternatives. Does this mean that zone() and locale() is always guaranteed to be the same across printer and all parsers?
This would clarify the question we had here #40100 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the ctor check for distinct locale and distinct zone

return this.parsers.get(0);
}

DateTimeFormatter getParser() {
return parser;
DateTimeFormatter getRoundupParser() {
return roundupParser;
}

DateTimeFormatter getPrinter() {
Expand All @@ -122,30 +113,64 @@ public TemporalAccessor parse(String input) {
}

try {
return parser.parse(input);
return doParse(input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a huge comment here would be warranted what happens here and why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely, will add

} catch (DateTimeParseException e) {
throw new IllegalArgumentException("failed to parse date field [" + input + "] with format [" + format + "]", e);
}
}

private TemporalAccessor doParse(String input) {
if (parsers.size() > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be statically initialized in the ctor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result of parsers.size() > 1 ? like a flag isCompositeParser ? I think this is not needed (probably won't give any performance gain) maybe just a method for this expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the complexity of the size() is expected to be constant in the implementation used here, I will leave this as it is.

for (DateTimeFormatter formatter : parsers) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not do this. It is exactly what we got rid of that caused massive performance degradation when using java time vs joda. I think we should fail when parsing the format if there is ambiguity.

Copy link
Contributor Author

@pgomulka pgomulka Mar 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if there really is ambiguity. If there is a composite pattern where the first pattern is a prefix of the second pattern (as in the example from the issue) then only the second one is really matching the date format.
This is the way it used to work in joda and we could support this.

It would be hard to explain in our doc why we forbid patterns like the one from the issue. Java-time javadoc does not mention this limitation.

There is obviously a performance drop (~14%), not sure if this is still acceptable. Will run Rally benchmarks later.

@polyfractal @Mpdreamz what are your views on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real opinion on implementation details, but I do think it's a huge breaking change if we can't support the old multi-format behavior. Even ignoring what we've supported in the past, yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS doesn't seem ambiguous/wrong/incorrect if you have both formats in your data.

And situations like yyyy-MM-dd'T'HH:mm:ss||date_optional_time might be technically redundant but due to our naming scheme it's not clear to the user what's going on.

And I'm sure there are thousands of legacy templates with patterns like those that will break on upgrade.

Wasn't the original performance issues due to throwing and catching exceptions as control-flow, rather than trying multiple parsers?

Copy link
Member

@danielmitterdorfer danielmitterdorfer Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @polyfractal: The root cause of the performance regression (#36602) was not that multiple parsers have been applied but rather that each of them has thrown an IllegalArgumentException on mismatch. I think dropping support for composite patterns is not an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored this method to use DateTimeFormatter.format.parseObject. The implementation in DateTimeFormatter::ClassicFormat is returning null when parsing failed. This has the same complexity as the approach currently in master. DateTimeFormatterBuilder:2349 is also iterating within enclosed optionals in a loop (but has this downside of returning when input is fully consumed)
see the results of Rally and microbenchmark on this PR #40364. These look very promising to me.

if (tryParseUnresolved(formatter, input) == true) {
return formatter.parse(input);
}
}
}
return firstParser().parse(input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this doing a second parse attempt with the first parser, if going through all the parsers did not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I guess we could throw an exception after the loop. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just be sure via ctor check, that the size is never 0? maybe just use an array instead of an array list, than the below check would always be a fixed size check as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should never be 0, if no parsers are provided, the printer will be used.
The only reason for this >1 check is to at least when there is only one parser (the regular, not composite pattern) it is not parsing the input twice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, will add an exception indicating that all parsers has failed. This won't mislead that the first parser failed (as it is at the moment)

}

/**
* Attempt parsing the input without throwing exception. This is needed because java-time requires ordering on optional (composite)
* patterns. Joda does not suffer from this.
* https://bugs.openjdk.java.net/browse/JDK-8188771
*
* @param input An arbitrary string resembling the string representation of a date or time
* @return true if parsing was successful, false if parsing failed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is not returning a boolean

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes sorry, that was a copypaste from previous rev. Fixed with more description

*/
private boolean tryParseUnresolved(DateTimeFormatter formatter, String input) {
try {
ParsePosition pp = new ParsePosition(0);
formatter.parseUnresolved(input, pp);
int len = input.length();
pgomulka marked this conversation as resolved.
Show resolved Hide resolved
if (pp.getErrorIndex() == -1 && pp.getIndex() == len) {
return true;
}
} catch (RuntimeException ex) {
// should not happen, but ignore if it does
pgomulka marked this conversation as resolved.
Show resolved Hide resolved
}
return false;
}

@Override
public DateFormatter withZone(ZoneId zoneId) {
// shortcurt to not create new objects unnecessarily
if (zoneId.equals(parser.getZone())) {
if (zoneId.equals(firstParser().getZone())) {
return this;
}

return new JavaDateFormatter(format, printer.withZone(zoneId), roundupParser.withZone(zoneId), parser.withZone(zoneId));
return new JavaDateFormatter(format, printer.withZone(zoneId), roundupParser.withZone(zoneId), firstParser().withZone(zoneId));
}

@Override
public DateFormatter withLocale(Locale locale) {
// shortcurt to not create new objects unnecessarily
if (locale.equals(parser.getLocale())) {
if (locale.equals(firstParser().getLocale())) {
return this;
}

return new JavaDateFormatter(format, printer.withLocale(locale), roundupParser.withLocale(locale), parser.withLocale(locale));
return new JavaDateFormatter(format, printer.withLocale(locale), roundupParser.withLocale(locale),
firstParser().withLocale(locale));
pgomulka marked this conversation as resolved.
Show resolved Hide resolved
}

@Override
Expand All @@ -170,7 +195,7 @@ public ZoneId zone() {

@Override
public DateMathParser toDateMathParser() {
return new JavaDateMathParser(format, parser, roundupParser);
return new JavaDateMathParser(format, this, getRoundupParser());
}

@Override
Expand All @@ -194,4 +219,8 @@ public boolean equals(Object obj) {
public String toString() {
return String.format(Locale.ROOT, "format[%s] locale[%s]", format, locale());
}

public Collection<DateTimeFormatter> getParsers() {
pgomulka marked this conversation as resolved.
Show resolved Hide resolved
return parsers;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import java.time.temporal.TemporalAdjusters;
import java.time.temporal.TemporalQueries;
import java.util.Objects;
import java.util.function.Function;
import java.util.function.LongSupplier;

/**
Expand All @@ -46,11 +47,11 @@
*/
public class JavaDateMathParser implements DateMathParser {

private final DateTimeFormatter formatter;
private final DateTimeFormatter roundUpFormatter;
private final String format;
private JavaDateFormatter formatter;
pgomulka marked this conversation as resolved.
Show resolved Hide resolved
private DateTimeFormatter roundUpFormatter;
private String format;

JavaDateMathParser(String format, DateTimeFormatter formatter, DateTimeFormatter roundUpFormatter) {
JavaDateMathParser(String format, JavaDateFormatter formatter, DateTimeFormatter roundUpFormatter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the intention was to allow alternatives in patterns for date math calculations.
We already discussed this and we do not need this.
I think that in this case I will revert this change and create with JavaDateMathParser(firstParser,getRoundUpFormatter)

Copy link
Contributor Author

@pgomulka pgomulka Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this should stay. Plenty of tests started to fail now because of that.
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/10768/
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-2/10640/
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-bwc/2575/
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request+oss-distro-docs/13976/
Plenty of usecases of JavaDateMathParser using composite patterns. For instance RangeQueryBuilderTests.testDateRangeQueryFormat where it uses the same parser for dates in two different formats

{
    "range" : {
       "mapped_date"  : {
            "gte": "01/01/2012",
            "lt": "2030",
            "format": "dd/MM/yyyy||yyyy"
        }
    }
}

So if we want to have an efficient parsing of composite patterns in JavaDateMathParser we should use JavaDateTimeFormat with the newly implemented parse method. That is when the roundUp is false.
I still can't find test cases or usages where roundUp would be true and composite pattern was used. Like this testcase which would fail at the moment

DateFormatter formatter = DateFormatter.forPattern("epoch_millis||HH:mm:ss");
        DateMathParser parser = formatter.toDateMathParser();
        parser.parse("04:52:20", () -> 0, true, (ZoneId) null)

this.format = format;
Objects.requireNonNull(formatter);
this.formatter = formatter;
Expand Down Expand Up @@ -215,20 +216,20 @@ private Instant parseDateTime(String value, ZoneId timeZone, boolean roundUpIfNo
throw new ElasticsearchParseException("cannot parse empty date");
}

DateTimeFormatter formatter = roundUpIfNoTime ? this.roundUpFormatter : this.formatter;
Function<String,TemporalAccessor> formatter = roundUpIfNoTime ? this.roundUpFormatter::parse : this.formatter::parse;
try {
if (timeZone == null) {
return DateFormatters.from(formatter.parse(value)).toInstant();
return DateFormatters.from(formatter.apply(value)).toInstant();
} else {
TemporalAccessor accessor = formatter.parse(value);
TemporalAccessor accessor = formatter.apply(value);
ZoneId zoneId = TemporalQueries.zone().queryFrom(accessor);
if (zoneId != null) {
timeZone = zoneId;
}

return DateFormatters.from(accessor).withZoneSameLocal(timeZone).toInstant();
}
} catch (DateTimeParseException e) {
} catch (IllegalArgumentException | DateTimeParseException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed as will revert back this class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually needed - see this #40100 (comment)

throw new ElasticsearchParseException("failed to parse date field [{}] with format [{}]: [{}]",
e, value, format, e.getMessage());
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,13 @@ public void testDuellingFormatsValidParsing() {
assertSameDate("2012-W1-1", "weekyear_week_day");
}

public void testCompositeParsing(){
//in all these examples the second pattern will be used
assertSameDate("2014-06-06T12:01:02.123", "yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some more tests:

  • 3 patterns
  • random ordering of many patterns

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add more test cases. Do you have anything specific in mind?
It won't harm adding testcases with 3 patterns, but I am not convinced we need a random ordering here. What scenario would it cover?

assertSameDate("2014-06-06T12:01:02.123", "strictDateTimeNoMillis||yyyy-MM-dd'T'HH:mm:ss.SSS");
assertSameDate("2014-06-06T12:01:02.123", "yyyy-MM-dd'T'HH:mm:ss+HH:MM||yyyy-MM-dd'T'HH:mm:ss.SSS");
}

public void testDuelingStrictParsing() {
assertSameDate("2018W313", "strict_basic_week_date");
assertParseException("18W313", "strict_basic_week_date");
Expand Down