-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle temporal columns in presto partitions #24054
Conversation
737269e
to
1c71e2f
Compare
I see I have |
Codecov Report
@@ Coverage Diff @@
## master #24054 +/- ##
=======================================
Coverage 68.26% 68.26%
=======================================
Files 1952 1952
Lines 75388 75388
Branches 8202 8202
=======================================
+ Hits 51462 51466 +4
+ Misses 21819 21815 -4
Partials 2107 2107
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
26a13d7
to
9e556b4
Compare
I've installed the pre-commit hook and satisfied the remaining issues it had (an unused import and some type complaints) so should be good now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functional changes LGTM, but one small comment regarding the method signature.
For other reviewers, see the related Slack discussion: https://apache-superset.slack.com/archives/C015WAZL0KH/p1683881155589549 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. I added a few comments.
col_type = column_type_by_name.get(col_name) | ||
|
||
if isinstance(col_type, types.DATE): | ||
col_type = Date() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little atypical that we’re overloading col_type
.
Why do only DATE
and TIMESTAMP
get mutated? Is this because of how the types are cast from a string in SQL, i.e., DATE ‘2023-05-01’
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have a column which is a date or timestamp but the value we've received is a string, and the underlying presto lib doesn't understand how to cast that by itself. Instead we have some types in presto_sql_types
which add that functionality and I'm swapping out with those custom types to render it correctly.
@villebro and I discussed that a more robust fix might be further upstream, such that we actually already get the right types here, but the fix proposed (changing the types around L220ish in this file) didn't fix the issue and introduced additional problems, so this looks like the most viable fix for now. It probably requires a broader look at the surrounding code to see if something was missed which would eliminate the need for this. However you can also see in convert_dttm
that other custom logic exists for these types, and other engines are also inheriting that custom behaviour from this one too (which is why modifying it had a wider impact)
@villebro @john-bodley comments addressed. |
@giftig can you rebase this PR? A test was broken on master when you last pushed, causing an unrelated test to fail. |
The where_latest_partition_date method incorrectly handled column types as strings, but they're provided as SQLA types instead. Deal with the DATE and TIMESTAMP cases, which were being incorrectly rendered in the query as a result of the above, and causing table preview queries to fail.
@villebro I was wondering about that unrelated test failure, thought it must have just been something transient. I've pushed the rebase, let's see if that works now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restamping this - @john-bodley are you ok with the latest changes? CI is green so we're ready to go I can do a follow-up to address the remaining comments, but I'd like to get this in, as this is a fairly critical bug affecting all Trino users.
The where_latest_partition_date method incorrectly handled column types as strings, but they're provided as SQLA types instead.
Deal with the DATE and TIMESTAMP cases, which were being incorrectly rendered in the query as a result of the above, and causing table preview queries to fail.
SUMMARY
TESTING INSTRUCTIONS
Create a trino table partitioned by DATE or TIMESTAMP and verify that the table preview works correctly.
ADDITIONAL INFORMATION