-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix DistinctCount for timestamps with time zone #10043
Conversation
4f749bd
to
28d6e2d
Compare
Preserve the original data type in the aggregation state
28d6e2d
to
cb59fbd
Compare
dt @ Decimal256(_, _) => Box::new( | ||
PrimitiveDistinctCountAccumulator::<Decimal256Type>::new() | ||
.with_data_type(dt.clone()), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the test for decimal too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect the test that shows the change is needed. If this change is removed, it failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is clear for me the change for timestamp with timezone, but I'm not sure about the decimal one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can show that for Decimal128 (not sure why it doesn't trigger for Decimal256). But it's better to be consistent, losing the data type might lead to unexpected consequences.
224ea2b
to
8e7d302
Compare
8e7d302
to
ce7843e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @joroKr21 and @jayzhan211
Timestamp(Second, _) => { | ||
Box::new(PrimitiveDistinctCountAccumulator::<TimestampSecondType>::new()) | ||
} | ||
dt @ Timestamp(Microsecond, _) => Box::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should defensively always set with_data_type
to PrimitiveDistinctCountAccumulator
🤔
I tried to come up with other DataType
that doesn't have a 1:1 mapping to its ArrowPrimitiveType
and I don't think there is one, but always supplying the data_type
might be a more defensive pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering the same. Let me know what you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we do it as a follow on PR? I would like to merge this one asap as it fixes a regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @joroKr21
* Fix DistinctCount for timestamps with time zone Preserve the original data type in the aggregation state * Add tests for decimal count distinct
* Fix DistinctCount for timestamps with time zone (apache#10043) * Fix DistinctCount for timestamps with time zone Preserve the original data type in the aggregation state * Add tests for decimal count distinct * Fix with_new_children for EmptyExec
* Fix DistinctCount for timestamps with time zone Preserve the original data type in the aggregation state * Add tests for decimal count distinct
* Fix DistinctCount for timestamps with time zone Preserve the original data type in the aggregation state * Add tests for decimal count distinct Co-authored-by: Georgi Krastev <[email protected]>
* Fix DistinctCount for timestamps with time zone Preserve the original data type in the aggregation state * Add tests for decimal count distinct
Which issue does this PR close?
Closes #10042
Rationale for this change
Fixing a regression.
What changes are included in this PR?
Preserve the original data type in the aggregation state.
Are these changes tested?
Tested in SLT.
Are there any user-facing changes?
Internal changes only.