-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rollup] Validate timezones based on rules not string comparision #36237
Conversation
The date_histogram internally converts obsolete timezones (such as "Canada/Mountain") into their modern equivalent ("America/Edmonton"). But rollup just stored the TZ as provided by the user. When checking the TZ for query validation we used a string comparison, which would fail due to the date_histo's upgrading behavior. Instead, we should convert both to a TimeZone object and check if their rules are compatible. This commit also proactively upgrades the config's TZ to the modern equivalent for good measure, although this isn't strictly necessary and old rollup indices will be fixed just by the query validation tweak. This has two side-effects: - We should not be adding a term filter on `time_zone` to the query. This was obsolete anyway due to filtering on job ID + individual msearch - Without the need for term filter on `time_zone`, there's no need for the request translator to add filtering clauses at all. This functionality was removed (although the filtered agg structure remains, can be cleaned up in a followup PR) It also exposed a bug: - After verifying the timezones are the same, we need to set the timezone on the rewritten date_histo otherwise it will treat the data as UTC and shift results.
Pinging @elastic/es-analytics-geo |
Forgot to update these. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Jenkins, run the gradle build tests 2 |
@@ -97,11 +98,13 @@ private static void checkDateHisto(DateHistogramAggregationBuilder source, List< | |||
if (agg.get(RollupField.AGG).equals(DateHistogramAggregationBuilder.NAME)) { | |||
DateHistogramInterval interval = new DateHistogramInterval((String)agg.get(RollupField.INTERVAL)); | |||
|
|||
String thisTimezone = (String)agg.get(DateHistogramGroupConfig.TIME_ZONE); | |||
String sourceTimeZone = source.timeZone() == null ? DateTimeZone.UTC.toString() : source.timeZone().toString(); | |||
TimeZone thisTimezone = DateTimeZone.forID((String)agg.get(DateHistogramGroupConfig.TIME_ZONE)).toTimeZone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use TimeZone, it is a legacy (ish) class in the jdk. Use ZoneId instead? If there are places needing DateTimeZone from joda, there is a conversion method in DateUtils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, will do. Thanks for the heads up
Just a note for posterity: |
Should we log a deprecation warning to make the user change to the non-obsolete name of the timezone? |
Do you know if there's a way to get that information from I don't know the new date stuff well (or even the old stuff well), so I may have overlooked it. I suppose we could add a static list somewhere ourselves, but that seems less good :) |
That's what I was thinking. This is just to help with migrating. I don't think it is anything we should maintain for a long time, so it doesn't need to be programmatic just in case more timezone names are deprecated. |
Note that I did something similar already in |
Sounds good, I can knock together something. In a followup PR I'll see about expanding the deprecation usage to other places that use Timezones too, like the date_histo agg |
Added a deprecation warning when creating a Rollup job that uses a deprecated/obsolete timezone. Notably this is only when creating the job, not anything else that touches the job, or search. I think that's a much bigger task and better suited to a separate PR. |
String sourceTimeZone = source.timeZone() == null ? DateTimeZone.UTC.toString() : source.timeZone().toString(); | ||
ZoneId thisTimezone = ZoneId.of(((String)agg.get(DateHistogramGroupConfig.TIME_ZONE)), ZoneId.SHORT_IDS); | ||
ZoneId sourceTimeZone = source.timeZone() == null | ||
? ZoneId.of(DateHistogramGroupConfig.DEFAULT_TIMEZONE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using ZoneId.of("UTC")
(as this effectively does) causes printing using formats with this ZoneId to be messed up. They will get a [UTC]
appended to the end of the time, like 1970-01-02T10:17:36.789Z[UTC]
. Instead, use ZoneOffset.UTC
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Urgh, yeah not what was intended. Fixing, thanks!
Ok, so I was becoming increasingly uncomfortable with how the timezone stuff was spreading everywhere. I think these changes are probably still needed at some point, but would rather contain that to a refactor PR. So I backed out the last ZoneId commit, then introduced a new default constant for ZoneIds that is used for comparisons, while the old String default is used for strings. This keeps the "UTC"/"Z" thing from spreading everywhere. |
Final update, yanked the last remaining pieces of Joda and tweaked the tests according to new master (using Javatime in date_histo gives slightly different results, depending on the timezone you specify). |
@spinscale @pgomulka Might I bother one of you for a review of the javatime stuff, since Ryan is away this week? The rollup bits should be good, but would appreciate someone to verify the time shenanagins and make sure I didn't do anything silly. |
We will address this in a separate bugfix pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks good in my view, but I don't know rollups all that much so probably I am not the right person to approve
.../rollup/src/main/java/org/elasticsearch/xpack/rollup/action/TransportPutRollupJobAction.java
Show resolved
Hide resolved
.../rollup/src/main/java/org/elasticsearch/xpack/rollup/action/TransportRollupSearchAction.java
Outdated
Show resolved
Hide resolved
Thanks @pgomulka! Will work on the date/time stuff. Jim previously OK'd the rollup portion, we've just been stuck on the new javatime changes since then :) |
filterConditions.add(new TermQueryBuilder(RollupField.formatFieldName(source, | ||
DateHistogramGroupConfig.TIME_ZONE), timezone)); | ||
ZoneId timeZone = source.timeZone() == null ? DateHistogramGroupConfig.DEFAULT_ZONEID_TIMEZONE : source.timeZone(); | ||
rolledDateHisto.timeZone(timeZone); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this has the change of always setting a timezone. The only practical change is that UTC will be explicitly set now instead of implicit. Every other timezone will be the same.
I figured this would make debugging a bit easier since there will always be a timezone.
@jimczi Since it's been a while, would you mind giving this one last sanity check? I don't think anything substantial has changed since you first looked at it, but there were some minor tweaks along the way as we sorted out the java-time stuff. The tl;dr: is that obsolete timezones throw a deprecation warning but are accepted, timezones are compared based on rules everywhere, and the code knows how to use obsolete or modern timezones for aggregating. And there are tests going both ways so we know if this ever breaks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding the tests
…6237) The date_histogram internally converts obsolete timezones (such as "Canada/Mountain") into their modern equivalent ("America/Edmonton"). But rollup just stored the TZ as provided by the user. When checking the TZ for query validation we used a string comparison, which would fail due to the date_histo's upgrading behavior. Instead, we should convert both to a TimeZone object and check if their rules are compatible.
…astic#36237) The date_histogram internally converts obsolete timezones (such as "Canada/Mountain") into their modern equivalent ("America/Edmonton"). But rollup just stored the TZ as provided by the user. When checking the TZ for query validation we used a string comparison, which would fail due to the date_histo's upgrading behavior. Instead, we should convert both to a TimeZone object and check if their rules are compatible.
The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain"
) into their modern equivalent ("America/Edmonton"
). But rollup just stores the TZ as provided by the user in the config.When checking the TZ for query validation we used a simple string comparison, which would fail due to the date_histo's upgrading behavior (
"Canada/Mountain" != "America/Edmonton"
)Instead, we should convert both to a
TimeZone
object and check if their rules are compatible.This commit also proactively upgrades the config's TZ to the modern equivalent for good measure, although this isn't strictly necessary and old rollup indices will be fixed just by the query validation tweak.This has two side-effects:
time_zone
to the query. This was obsolete anyway due to filtering on job ID + individual msearchtime_zone
, there's no need for the request translator to add filtering clauses at all. This functionality was removed (although the filtered agg structure remains, can be cleaned up in a followup PR)It also exposed a bug:
I think this is going to miss the boat for6.5.3
, but I'd like to backport it to6.5.4
when the time comes. There isn't a workaround if the user indexed rollups with an "obsolete" TZ, and not setting the timezone on the query itself will mess up results.Closes #36229