Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rollup] Validate timezones based on rules not string comparision #36237

Merged
merged 26 commits into from
Apr 17, 2019

Conversation

polyfractal
Copy link
Contributor

@polyfractal polyfractal commented Dec 4, 2018

The date_histogram internally converts obsolete timezones (such as "Canada/Mountain") into their modern equivalent ("America/Edmonton"). But rollup just stores the TZ as provided by the user in the config.

When checking the TZ for query validation we used a simple string comparison, which would fail due to the date_histo's upgrading behavior ("Canada/Mountain" != "America/Edmonton")

Instead, we should convert both to a TimeZone object and check if their rules are compatible. This commit also proactively upgrades the config's TZ to the modern equivalent for good measure, although this isn't strictly necessary and old rollup indices will be fixed just by the query validation tweak.

This has two side-effects:

  • We should not be adding a term filter on time_zone to the query. This was obsolete anyway due to filtering on job ID + individual msearch
  • Without the need for term filter on time_zone, there's no need for the request translator to add filtering clauses at all. This functionality was removed (although the filtered agg structure remains, can be cleaned up in a followup PR)

It also exposed a bug:

  • After verifying the timezones are the same, we need to set the timezone on the rewritten date_histo otherwise it will treat the data as UTC and shift results.

I think this is going to miss the boat for 6.5.3, but I'd like to backport it to 6.5.4 when the time comes. There isn't a workaround if the user indexed rollups with an "obsolete" TZ, and not setting the timezone on the query itself will mess up results.

Closes #36229

The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain") into their modern equivalent ("America/Edmonton").
But rollup just stored the TZ as provided by the user.

When checking the TZ for query validation we used a string comparison,
which would fail due to the date_histo's upgrading behavior.

Instead, we should convert both to a TimeZone object and check if their
rules are compatible.  This commit also proactively upgrades the
config's TZ to the modern equivalent for good measure, although this
isn't strictly necessary and old rollup indices will be fixed just
by the query validation tweak.

This has two side-effects:

- We should not be adding a term filter on `time_zone` to the query.
This was obsolete anyway due to filtering on job ID + individual
msearch
- Without the need for term filter on `time_zone`, there's no need
for the request translator to add filtering clauses at all.  This
functionality was removed (although the filtered agg structure remains,
can be cleaned up in a followup PR)

It also exposed a bug:

- After verifying the timezones are the same, we need to set the timezone
on the rewritten date_histo otherwise it will treat the data as UTC
and shift results.
@polyfractal polyfractal added >bug v7.0.0 :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.6.0 labels Dec 4, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo

Forgot to update these. :)
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@polyfractal
Copy link
Contributor Author

Jenkins, run the gradle build tests 2

@@ -97,11 +98,13 @@ private static void checkDateHisto(DateHistogramAggregationBuilder source, List<
if (agg.get(RollupField.AGG).equals(DateHistogramAggregationBuilder.NAME)) {
DateHistogramInterval interval = new DateHistogramInterval((String)agg.get(RollupField.INTERVAL));

String thisTimezone = (String)agg.get(DateHistogramGroupConfig.TIME_ZONE);
String sourceTimeZone = source.timeZone() == null ? DateTimeZone.UTC.toString() : source.timeZone().toString();
TimeZone thisTimezone = DateTimeZone.forID((String)agg.get(DateHistogramGroupConfig.TIME_ZONE)).toTimeZone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use TimeZone, it is a legacy (ish) class in the jdk. Use ZoneId instead? If there are places needing DateTimeZone from joda, there is a conversion method in DateUtils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, will do. Thanks for the heads up

@polyfractal
Copy link
Contributor Author

Just a note for posterity: ZoneId doesn't "convert" the string representation of obsolete TZ into the new version (like TimeZone did). So Rollup will continue to store whatever the user gives us (provided it is a valid ZoneId) and we'll do the rule comparison as before like we did with TimeZone.

@rjernst
Copy link
Member

rjernst commented Dec 10, 2018

Should we log a deprecation warning to make the user change to the non-obsolete name of the timezone?

@polyfractal
Copy link
Contributor Author

Do you know if there's a way to get that information from ZoneId or other new date/time classes? I looked around and couldn't find something that would tell me that a particular TZ is deprecated or not, short of using the older TimeZone class. ZoneId does have a map of "short name" aliases ("EST", etc) but not the longer deprecated forms like Japan.

I don't know the new date stuff well (or even the old stuff well), so I may have overlooked it.

I suppose we could add a static list somewhere ourselves, but that seems less good :)

@rjernst
Copy link
Member

rjernst commented Dec 11, 2018

I suppose we could add a static list somewhere ourselves

That's what I was thinking. This is just to help with migrating. I don't think it is anything we should maintain for a long time, so it doesn't need to be programmatic just in case more timezone names are deprecated.

@rjernst
Copy link
Member

rjernst commented Dec 11, 2018

Note that I did something similar already in DateUtils.DEPRECATED_SHORT_TZ_IDS

@polyfractal
Copy link
Contributor Author

Sounds good, I can knock together something. In a followup PR I'll see about expanding the deprecation usage to other places that use Timezones too, like the date_histo agg

@polyfractal
Copy link
Contributor Author

Added a deprecation warning when creating a Rollup job that uses a deprecated/obsolete timezone. Notably this is only when creating the job, not anything else that touches the job, or search. I think that's a much bigger task and better suited to a separate PR.

String sourceTimeZone = source.timeZone() == null ? DateTimeZone.UTC.toString() : source.timeZone().toString();
ZoneId thisTimezone = ZoneId.of(((String)agg.get(DateHistogramGroupConfig.TIME_ZONE)), ZoneId.SHORT_IDS);
ZoneId sourceTimeZone = source.timeZone() == null
? ZoneId.of(DateHistogramGroupConfig.DEFAULT_TIMEZONE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ZoneId.of("UTC") (as this effectively does) causes printing using formats with this ZoneId to be messed up. They will get a [UTC] appended to the end of the time, like 1970-01-02T10:17:36.789Z[UTC]. Instead, use ZoneOffset.UTC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urgh, yeah not what was intended. Fixing, thanks!

@polyfractal
Copy link
Contributor Author

Ok, so I was becoming increasingly uncomfortable with how the timezone stuff was spreading everywhere. I think these changes are probably still needed at some point, but would rather contain that to a refactor PR.

So I backed out the last ZoneId commit, then introduced a new default constant for ZoneIds that is used for comparisons, while the old String default is used for strings. This keeps the "UTC"/"Z" thing from spreading everywhere.

@polyfractal
Copy link
Contributor Author

Final update, yanked the last remaining pieces of Joda and tweaked the tests according to new master (using Javatime in date_histo gives slightly different results, depending on the timezone you specify).

@polyfractal
Copy link
Contributor Author

@spinscale @pgomulka Might I bother one of you for a review of the javatime stuff, since Ryan is away this week? The rollup bits should be good, but would appreciate someone to verify the time shenanagins and make sure I didn't do anything silly.

@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019
We will address this in a separate bugfix pr
Copy link
Contributor

@pgomulka pgomulka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that looks good in my view, but I don't know rollups all that much so probably I am not the right person to approve

@polyfractal
Copy link
Contributor Author

Thanks @pgomulka! Will work on the date/time stuff. Jim previously OK'd the rollup portion, we've just been stuck on the new javatime changes since then :)

filterConditions.add(new TermQueryBuilder(RollupField.formatFieldName(source,
DateHistogramGroupConfig.TIME_ZONE), timezone));
ZoneId timeZone = source.timeZone() == null ? DateHistogramGroupConfig.DEFAULT_ZONEID_TIMEZONE : source.timeZone();
rolledDateHisto.timeZone(timeZone);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this has the change of always setting a timezone. The only practical change is that UTC will be explicitly set now instead of implicit. Every other timezone will be the same.

I figured this would make debugging a bit easier since there will always be a timezone.

@polyfractal
Copy link
Contributor Author

@jimczi Since it's been a while, would you mind giving this one last sanity check? I don't think anything substantial has changed since you first looked at it, but there were some minor tweaks along the way as we sorted out the java-time stuff.

The tl;dr: is that obsolete timezones throw a deprecation warning but are accepted, timezones are compared based on rules everywhere, and the code knows how to use obsolete or modern timezones for aggregating. And there are tests going both ways so we know if this ever breaks.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding the tests

@polyfractal polyfractal merged commit 1f51f20 into elastic:master Apr 17, 2019
polyfractal added a commit that referenced this pull request Apr 17, 2019
…6237)

The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain") into their modern equivalent ("America/Edmonton").
But rollup just stored the TZ as provided by the user.

When checking the TZ for query validation we used a string comparison,
which would fail due to the date_histo's upgrading behavior.

Instead, we should convert both to a TimeZone object and check if their
rules are compatible.
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
…astic#36237)

The date_histogram internally converts obsolete timezones (such as
"Canada/Mountain") into their modern equivalent ("America/Edmonton").
But rollup just stored the TZ as provided by the user.

When checking the TZ for query validation we used a string comparison,
which would fail due to the date_histo's upgrading behavior.

Instead, we should convert both to a TimeZone object and check if their
rules are compatible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rollup searches based on "obsolete" timezones fail
8 participants