Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify predicates involving year function #16106

Merged
merged 1 commit into from
Mar 17, 2023

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Feb 14, 2023

Description

Simplify predicates involving year function likes UnwrapDateTruncInComparison.
Continuation of #15306
Fixes #14078

Release notes

(x) Release notes are required, with the following suggested text:

# General
* Improve performance of queries that contain predicates involving `year` function. ({issue}`14078`)

@martint
Copy link
Member

martint commented Feb 14, 2023

Rewrite year function to date_trunc for improving performance.

Why would this rewrite improve performance?

@ebyhr
Copy link
Member Author

ebyhr commented Feb 14, 2023

@martint We have UnwrapDateTruncInComparison. Updated the PR description.

@ebyhr ebyhr requested a review from martint February 14, 2023 08:59
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a roundabout way of improving this specific type of expression. The optimization should go the other way -- an expression like year(date_time) = t is inherently cheaper than date_trunc('year', date_time) = ..., since the date_trunc function is more generic and expensive to compute. In general, optimizations should stand on their own (i.e., not depend on other rules to be successful). They should produce a better plan regardless of whether another optimization kicks in later.

@ebyhr ebyhr force-pushed the ebi/iceberg-year-function branch from 1b64348 to b9bb1ac Compare February 15, 2023 01:00
@ebyhr ebyhr changed the title Rewrite year function to date_trunc for improving performance Simplify predicates involving year function Feb 15, 2023
@ebyhr ebyhr force-pushed the ebi/iceberg-year-function branch from b9bb1ac to 38338cb Compare February 15, 2023 01:09
@ebyhr
Copy link
Member Author

ebyhr commented Feb 15, 2023

@martint Changed the implementation not to depend on UnwrapDateTruncInComparison.

@ebyhr ebyhr requested a review from martint February 15, 2023 08:57
@findinpath findinpath self-requested a review February 20, 2023 06:25
}
if (type instanceof TimestampType timestampType) {
if (timestampType.isShort()) {
LocalDateTime dateTime = LocalDateTime.MAX.withYear(year).with(lastDayOfYear());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is .with(lastDayOfYear()) still necessary in the context of using LocalDateTime.MAX ?

}
if (type instanceof TimestampType timestampType) {
if (timestampType.isShort()) {
LocalDateTime dateTime = LocalDateTime.of(year, 1, 1, 0, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's same for short and long timestamp type, so move before the if (timestampType.isShort()) check:

long yearStartEpochSecond = LocalDateTime.of(year, 1, 1, 0, 0).toEpochSecond(ZoneOffset.UTC);
long yearStartEpochMicros = multiplyExact(yearStartEpochSecond, MICROSECONDS_PER_SECOND);
if (timestampType.isShort()) {
    return yearStartEpochMicros;
}
return new LongTimestamp(yearStartEpochMicros, 0);

if (type instanceof TimestampType timestampType) {
if (timestampType.isShort()) {
LocalDateTime dateTime = LocalDateTime.of(year, 1, 1, 0, 0);
return dateTime.toEpochSecond(ZoneOffset.UTC) * MICROSECONDS_PER_SECOND + LongMath.divide(dateTime.getNano(), NANOSECONDS_PER_MICROSECOND, UNNECESSARY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dateTime.getNano() is known to be 0

if (type instanceof TimestampType timestampType) {
if (timestampType.isShort()) {
LocalDateTime dateTime = LocalDateTime.of(year, 1, 1, 0, 0);
return dateTime.toEpochSecond(ZoneOffset.UTC) * MICROSECONDS_PER_SECOND + LongMath.divide(dateTime.getNano(), NANOSECONDS_PER_MICROSECOND, UNNECESSARY);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should use multiplyExact

return dateTime.toEpochSecond(ZoneOffset.UTC) * MICROSECONDS_PER_SECOND + LongMath.divide(dateTime.getNano(), NANOSECONDS_PER_MICROSECOND, UNNECESSARY);
}
LocalDateTime dateTime = LocalDateTime.of(year, 1, 1, 0, 0);
long endInclusiveMicros = dateTime.toEpochSecond(ZoneOffset.UTC) * MICROSECONDS_PER_SECOND
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endInclusiveMicros wrong var name?

}
if (type instanceof TimestampType timestampType) {
if (timestampType.isShort()) {
LocalDateTime dateTime = LocalDateTime.MAX.withYear(year).with(lastDayOfYear());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally i find LocalDateTime.MAX harder to reason about.
I first need to assume the max value has .999999999 second fraction (which is obvious if you know LocalDateTime internal representation) and then I need to reason how this behaves for our timestamp with precision > 9.

What about writing this more explicitly

long nextYearStartEpochSecond = LocalDateTime.of(year + 1, 1, 1, 0, 0).toEpochSecond(ZoneOffset.UTC);
long nextYearStartEpochMicros = multiplyExact(nextYearStartEpochSecond, MICROSECONDS_PER_SECOND);
if (timestampType.isShort()) {
    // TODO might be off by one
    return nextYearStartEpochMicros - scaleFactor(timestampType.getPrecision(), 6);
}
int picosOfMicro = toIntExact(PICOSECONDS_PER_MICROSECOND - scaleFactor(timestampType.getPrecision(), 12));
return new LongTimestamp(nextYearStartEpochMicros - 1, picosOfMicro);

(i hope i didn't screw the math, but not sure, hence the TODO comment; please remove the TODO)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this method should be unit-tested

{
checkArgument(constant.getValue() != null && argumentType.equals(TIMESTAMP_TZ_MICROS), "Unexpected constant: %s", constant);

// Normalized to UTC because for comparisons the zone is irrelevant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove (copy&paste error)


private static Optional<Domain> unwrapYearInComparison(FunctionName functionName, Type argumentType, Constant constant)
{
checkArgument(constant.getValue() != null && argumentType.equals(TIMESTAMP_TZ_MICROS), "Unexpected constant: %s", constant);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we check two arguments but report only one in the message

checkArgument(constant.getValue() != null, "Unexpected constant: %s", constant);
checkArgument(type.equals(TIMESTAMP_TZ_MICROS), "Unexpected type: %s", type);

Copy link
Member

@findepi findepi Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( #16594 for where this was copied from )

public void testExtractYearTimestampTzComparison()
{
String timestampTzColumnSymbol = "timestamp_tz_symbol";
FunctionCall truncateToYear = new FunctionCall(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractYear

List.of(new SymbolReference(timestampTzColumnSymbol)));

LocalDate someDate = LocalDate.of(2005, 9, 10);
Expression someMidnightExpression = LITERAL_ENCODER.toExpression(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yearExpression?

Comment on lines 282 to 284
long startOfDateUtcEpochMillis = someDate.withDayOfYear(1).atStartOfDay().toEpochSecond(UTC) * MILLISECONDS_PER_SECOND;
LongTimestampWithTimeZone startOfDateUtc = timestampTzFromEpochMillis(startOfDateUtcEpochMillis);
LongTimestampWithTimeZone startOfNextDateUtc = timestampTzFromEpochMillis(someDate.plusYears(1).withDayOfYear(1).atStartOfDay().toEpochSecond(UTC) * MILLISECONDS_PER_SECOND);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startOfDate -> startOfYear

@ebyhr ebyhr force-pushed the ebi/iceberg-year-function branch from 38338cb to 7d5a3b5 Compare March 17, 2023 01:12
@github-actions github-actions bot added the iceberg Iceberg connector label Mar 17, 2023
@ebyhr ebyhr merged commit b8967a3 into trinodb:master Mar 17, 2023
@ebyhr ebyhr deleted the ebi/iceberg-year-function branch March 17, 2023 05:34
@github-actions github-actions bot added this to the 411 milestone Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

Iceberg predicate pushdown for date/time columns and year() function
4 participants