-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse composite patterns using ClassicFormat.parseObject #40100
Changes from 10 commits
97d0472
4c7e997
a6448d2
aa505e2
abb0bc6
5bcbb7b
44ae404
945a6ad
68b11f4
8fb210c
0f2fca3
22269ce
71c149e
ef01371
0337fe0
b295fb0
09c1285
123633c
f24673f
1126ed1
8fcb0b1
3019974
9ffdb89
f35d0de
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,7 @@ | |
|
||
import org.elasticsearch.common.Strings; | ||
|
||
import java.text.ParsePosition; | ||
import java.time.ZoneId; | ||
import java.time.format.DateTimeFormatter; | ||
import java.time.format.DateTimeFormatterBuilder; | ||
|
@@ -29,7 +30,9 @@ | |
import java.time.temporal.TemporalAccessor; | ||
import java.time.temporal.TemporalField; | ||
import java.util.Arrays; | ||
import java.util.Collection; | ||
import java.util.HashMap; | ||
import java.util.List; | ||
import java.util.Locale; | ||
import java.util.Map; | ||
import java.util.Objects; | ||
|
@@ -50,16 +53,9 @@ class JavaDateFormatter implements DateFormatter { | |
|
||
private final String format; | ||
private final DateTimeFormatter printer; | ||
private final DateTimeFormatter parser; | ||
private final List<DateTimeFormatter> parsers; | ||
private final DateTimeFormatter roundupParser; | ||
|
||
private JavaDateFormatter(String format, DateTimeFormatter printer, DateTimeFormatter roundupParser, DateTimeFormatter parser) { | ||
this.format = format; | ||
this.printer = printer; | ||
this.roundupParser = roundupParser; | ||
this.parser = parser; | ||
} | ||
|
||
JavaDateFormatter(String format, DateTimeFormatter printer, DateTimeFormatter... parsers) { | ||
this(format, printer, builder -> ROUND_UP_BASE_FIELDS.forEach(builder::parseDefaulting), parsers); | ||
} | ||
|
@@ -79,36 +75,31 @@ private JavaDateFormatter(String format, DateTimeFormatter printer, DateTimeForm | |
} | ||
this.printer = printer; | ||
this.format = format; | ||
|
||
if (parsers.length == 0) { | ||
this.parser = printer; | ||
} else if (parsers.length == 1) { | ||
this.parser = parsers[0]; | ||
this.parsers = Arrays.asList(printer); | ||
pgomulka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} else { | ||
DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder(); | ||
for (DateTimeFormatter parser : parsers) { | ||
builder.appendOptional(parser); | ||
} | ||
this.parser = builder.toFormatter(Locale.ROOT); | ||
this.parsers = Arrays.asList(parsers); | ||
} | ||
|
||
DateTimeFormatterBuilder builder = new DateTimeFormatterBuilder(); | ||
if (format.contains("||") == false) { | ||
builder.append(this.parser); | ||
builder.append(firstParser()); | ||
} | ||
roundupParserConsumer.accept(builder); | ||
DateTimeFormatter roundupFormatter = builder.toFormatter(parser.getLocale()); | ||
DateTimeFormatter roundupFormatter = builder.toFormatter(firstParser().getLocale()); | ||
if (printer.getZone() != null) { | ||
roundupFormatter = roundupFormatter.withZone(printer.getZone()); | ||
} | ||
this.roundupParser = roundupFormatter; | ||
} | ||
|
||
DateTimeFormatter getRoundupParser() { | ||
return roundupParser; | ||
private DateTimeFormatter firstParser() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Personal opinion: I dislike this method somehow. There are alternatives whenever this method is called (using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. my intention was to avoid having repetitive calls of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. see the ctor check for distinct locale and distinct zone |
||
return this.parsers.get(0); | ||
} | ||
|
||
DateTimeFormatter getParser() { | ||
return parser; | ||
DateTimeFormatter getRoundupParser() { | ||
return roundupParser; | ||
} | ||
|
||
DateTimeFormatter getPrinter() { | ||
|
@@ -122,30 +113,64 @@ public TemporalAccessor parse(String input) { | |
} | ||
|
||
try { | ||
return parser.parse(input); | ||
return doParse(input); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a huge comment here would be warranted what happens here and why? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. absolutely, will add |
||
} catch (DateTimeParseException e) { | ||
throw new IllegalArgumentException("failed to parse date field [" + input + "] with format [" + format + "]", e); | ||
} | ||
} | ||
|
||
private TemporalAccessor doParse(String input) { | ||
if (parsers.size() > 1) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this could be statically initialized in the ctor? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the result of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since the complexity of the size() is expected to be constant in the implementation used here, I will leave this as it is. |
||
for (DateTimeFormatter formatter : parsers) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not do this. It is exactly what we got rid of that caused massive performance degradation when using java time vs joda. I think we should fail when parsing the format if there is ambiguity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if there really is ambiguity. If there is a composite pattern where the first pattern is a prefix of the second pattern (as in the example from the issue) then only the second one is really matching the date format. It would be hard to explain in our doc why we forbid patterns like the one from the issue. Java-time javadoc does not mention this limitation. There is obviously a performance drop (~14%), not sure if this is still acceptable. Will run Rally benchmarks later. @polyfractal @Mpdreamz what are your views on this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No real opinion on implementation details, but I do think it's a huge breaking change if we can't support the old multi-format behavior. Even ignoring what we've supported in the past, And situations like And I'm sure there are thousands of legacy templates with patterns like those that will break on upgrade. Wasn't the original performance issues due to throwing and catching exceptions as control-flow, rather than trying multiple parsers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @polyfractal: The root cause of the performance regression (#36602) was not that multiple parsers have been applied but rather that each of them has thrown an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have refactored this method to use |
||
if (tryParseUnresolved(formatter, input) == true) { | ||
return formatter.parse(input); | ||
} | ||
} | ||
} | ||
return firstParser().parse(input); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this doing a second parse attempt with the first parser, if going through all the parsers did not work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, I guess we could throw an exception after the loop. WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or just be sure via ctor check, that the size is never There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should never be 0, if no parsers are provided, the printer will be used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as discussed, will add an exception indicating that all parsers has failed. This won't mislead that the first parser failed (as it is at the moment) |
||
} | ||
|
||
/** | ||
* Attempt parsing the input without throwing exception. This is needed because java-time requires ordering on optional (composite) | ||
* patterns. Joda does not suffer from this. | ||
* https://bugs.openjdk.java.net/browse/JDK-8188771 | ||
* | ||
* @param input An arbitrary string resembling the string representation of a date or time | ||
* @return true if parsing was successful, false if parsing failed | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this method is not returning a boolean There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes sorry, that was a copypaste from previous rev. Fixed with more description |
||
*/ | ||
private boolean tryParseUnresolved(DateTimeFormatter formatter, String input) { | ||
try { | ||
ParsePosition pp = new ParsePosition(0); | ||
formatter.parseUnresolved(input, pp); | ||
int len = input.length(); | ||
pgomulka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if (pp.getErrorIndex() == -1 && pp.getIndex() == len) { | ||
return true; | ||
} | ||
} catch (RuntimeException ex) { | ||
// should not happen, but ignore if it does | ||
pgomulka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
return false; | ||
} | ||
|
||
@Override | ||
public DateFormatter withZone(ZoneId zoneId) { | ||
// shortcurt to not create new objects unnecessarily | ||
if (zoneId.equals(parser.getZone())) { | ||
if (zoneId.equals(firstParser().getZone())) { | ||
return this; | ||
} | ||
|
||
return new JavaDateFormatter(format, printer.withZone(zoneId), roundupParser.withZone(zoneId), parser.withZone(zoneId)); | ||
return new JavaDateFormatter(format, printer.withZone(zoneId), roundupParser.withZone(zoneId), firstParser().withZone(zoneId)); | ||
} | ||
|
||
@Override | ||
public DateFormatter withLocale(Locale locale) { | ||
// shortcurt to not create new objects unnecessarily | ||
if (locale.equals(parser.getLocale())) { | ||
if (locale.equals(firstParser().getLocale())) { | ||
return this; | ||
} | ||
|
||
return new JavaDateFormatter(format, printer.withLocale(locale), roundupParser.withLocale(locale), parser.withLocale(locale)); | ||
return new JavaDateFormatter(format, printer.withLocale(locale), roundupParser.withLocale(locale), | ||
firstParser().withLocale(locale)); | ||
pgomulka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
|
||
@Override | ||
|
@@ -170,7 +195,7 @@ public ZoneId zone() { | |
|
||
@Override | ||
public DateMathParser toDateMathParser() { | ||
return new JavaDateMathParser(format, parser, roundupParser); | ||
return new JavaDateMathParser(format, this, getRoundupParser()); | ||
} | ||
|
||
@Override | ||
|
@@ -194,4 +219,8 @@ public boolean equals(Object obj) { | |
public String toString() { | ||
return String.format(Locale.ROOT, "format[%s] locale[%s]", format, locale()); | ||
} | ||
|
||
public Collection<DateTimeFormatter> getParsers() { | ||
pgomulka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return parsers; | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,6 +35,7 @@ | |
import java.time.temporal.TemporalAdjusters; | ||
import java.time.temporal.TemporalQueries; | ||
import java.util.Objects; | ||
import java.util.function.Function; | ||
import java.util.function.LongSupplier; | ||
|
||
/** | ||
|
@@ -46,11 +47,11 @@ | |
*/ | ||
public class JavaDateMathParser implements DateMathParser { | ||
|
||
private final DateTimeFormatter formatter; | ||
private final DateTimeFormatter roundUpFormatter; | ||
private final String format; | ||
private JavaDateFormatter formatter; | ||
pgomulka marked this conversation as resolved.
Show resolved
Hide resolved
|
||
private DateTimeFormatter roundUpFormatter; | ||
private String format; | ||
|
||
JavaDateMathParser(String format, DateTimeFormatter formatter, DateTimeFormatter roundUpFormatter) { | ||
JavaDateMathParser(String format, JavaDateFormatter formatter, DateTimeFormatter roundUpFormatter) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why this change? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the intention was to allow alternatives in patterns for date math calculations. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually this should stay. Plenty of tests started to fail now because of that.
So if we want to have an efficient parsing of composite patterns in
|
||
this.format = format; | ||
Objects.requireNonNull(formatter); | ||
this.formatter = formatter; | ||
|
@@ -215,20 +216,20 @@ private Instant parseDateTime(String value, ZoneId timeZone, boolean roundUpIfNo | |
throw new ElasticsearchParseException("cannot parse empty date"); | ||
} | ||
|
||
DateTimeFormatter formatter = roundUpIfNoTime ? this.roundUpFormatter : this.formatter; | ||
Function<String,TemporalAccessor> formatter = roundUpIfNoTime ? this.roundUpFormatter::parse : this.formatter::parse; | ||
try { | ||
if (timeZone == null) { | ||
return DateFormatters.from(formatter.parse(value)).toInstant(); | ||
return DateFormatters.from(formatter.apply(value)).toInstant(); | ||
} else { | ||
TemporalAccessor accessor = formatter.parse(value); | ||
TemporalAccessor accessor = formatter.apply(value); | ||
ZoneId zoneId = TemporalQueries.zone().queryFrom(accessor); | ||
if (zoneId != null) { | ||
timeZone = zoneId; | ||
} | ||
|
||
return DateFormatters.from(accessor).withZoneSameLocal(timeZone).toInstant(); | ||
} | ||
} catch (DateTimeParseException e) { | ||
} catch (IllegalArgumentException | DateTimeParseException e) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this still needed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not needed as will revert back this class There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually needed - see this #40100 (comment) |
||
throw new ElasticsearchParseException("failed to parse date field [{}] with format [{}]: [{}]", | ||
e, value, format, e.getMessage()); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -343,6 +343,13 @@ public void testDuellingFormatsValidParsing() { | |
assertSameDate("2012-W1-1", "weekyear_week_day"); | ||
} | ||
|
||
public void testCompositeParsing(){ | ||
//in all these examples the second pattern will be used | ||
assertSameDate("2014-06-06T12:01:02.123", "yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need some more tests:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to add more test cases. Do you have anything specific in mind? |
||
assertSameDate("2014-06-06T12:01:02.123", "strictDateTimeNoMillis||yyyy-MM-dd'T'HH:mm:ss.SSS"); | ||
assertSameDate("2014-06-06T12:01:02.123", "yyyy-MM-dd'T'HH:mm:ss+HH:MM||yyyy-MM-dd'T'HH:mm:ss.SSS"); | ||
} | ||
|
||
public void testDuelingStrictParsing() { | ||
assertSameDate("2018W313", "strict_basic_week_date"); | ||
assertParseException("18W313", "strict_basic_week_date"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a fan of this merge method. It extracts relatively low level java.time.DateTimeFormatter when it should stick with org.elasticsearch.common.time.DateFormatter abstraction as long as possible.
Possibly the roundUpBuilder should also be used inside JavaDateFormatter constructor?
also I suspect that roundUpBuilder will suffer from the same problem when it is a composite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can refactor this in a separate PR then? Also, this code should only be there temporary? As soon as we get rid of Joda time in the code base, I expect that we can get rid of quite a few abstractions.
Is it intended to be used as a composite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used as a composite in the
merge
method when constructing the roundUpParser with theappendOptional
.I can imagine we can have the same pattern as on the issue (
yyyy-MM-dd'T'HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS
) but used with this parser. Will try to come up with a testcaseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed that with @spinscale and there is no way we could create a pattern that would suffer from the same problem.
For index name calculations like
prefix-{2010-01-01/d{yyyy-MM-dd||yyyy-MM-ddTHH}}
it would fail parsing, as the only thing expected after|
is the timezone (it would fail saying unexpected|
)