Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: handle temporal columns in presto partitions #24054

Merged
merged 1 commit into from
May 19, 2023

Conversation

giftig
Copy link
Contributor

@giftig giftig commented May 15, 2023

The where_latest_partition_date method incorrectly handled column types as strings, but they're provided as SQLA types instead.

Deal with the DATE and TIMESTAMP cases, which were being incorrectly rendered in the query as a result of the above, and causing table preview queries to fail.

SUMMARY

TESTING INSTRUCTIONS

Create a trino table partitioned by DATE or TIMESTAMP and verify that the table preview works correctly.

ADDITIONAL INFORMATION

  • Has associated issue: Preview table query fails for trino tables with date partition #24055
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@villebro villebro self-requested a review May 15, 2023 12:38
@giftig giftig force-pushed the trino-preview-fix branch 2 times, most recently from 737269e to 1c71e2f Compare May 15, 2023 15:56
@giftig
Copy link
Contributor Author

giftig commented May 15, 2023

I see I have black failures so I've just run black and repushed. I also spotted some type checking warnings, which might require a bit more thought if they cause the build to fail. I'll look into them tomorrow if they're an issue.

@codecov
Copy link

codecov bot commented May 15, 2023

Codecov Report

Merging #24054 (5b7f795) into master (d583ca9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 5b7f795 differs from pull request most recent head d67b11e. Consider uploading reports for the commit d67b11e to get more accurate results

@@           Coverage Diff           @@
##           master   #24054   +/-   ##
=======================================
  Coverage   68.26%   68.26%           
=======================================
  Files        1952     1952           
  Lines       75388    75388           
  Branches     8202     8202           
=======================================
+ Hits        51462    51466    +4     
+ Misses      21819    21815    -4     
  Partials     2107     2107           
Flag Coverage Δ
hive 53.23% <0.00%> (ø)
mysql 78.94% <66.66%> (ø)
postgres 79.01% <66.66%> (ø)
presto 53.16% <0.00%> (ø)
python 82.82% <100.00%> (+0.01%) ⬆️
sqlite 77.54% <66.66%> (ø)
unit 53.22% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/base.py 90.80% <ø> (ø)
superset/db_engine_specs/hive.py 87.54% <ø> (ø)
superset/db_engine_specs/presto.py 87.91% <100.00%> (+0.41%) ⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@giftig giftig force-pushed the trino-preview-fix branch 3 times, most recently from 26a13d7 to 9e556b4 Compare May 16, 2023 09:22
@giftig
Copy link
Contributor Author

giftig commented May 16, 2023

I've installed the pre-commit hook and satisfied the remaining issues it had (an unused import and some type complaints) so should be good now.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functional changes LGTM, but one small comment regarding the method signature.

superset/db_engine_specs/presto.py Show resolved Hide resolved
@villebro
Copy link
Member

For other reviewers, see the related Slack discussion: https://apache-superset.slack.com/archives/C015WAZL0KH/p1683881155589549

Copy link
Member

@john-bodley john-bodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I added a few comments.

tests/unit_tests/db_engine_specs/test_presto.py Outdated Show resolved Hide resolved
tests/unit_tests/db_engine_specs/test_presto.py Outdated Show resolved Hide resolved
col_type = column_type_by_name.get(col_name)

if isinstance(col_type, types.DATE):
col_type = Date()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little atypical that we’re overloading col_type.

Why do only DATE and TIMESTAMP get mutated? Is this because of how the types are cast from a string in SQL, i.e., DATE ‘2023-05-01’?

Copy link
Contributor Author

@giftig giftig May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have a column which is a date or timestamp but the value we've received is a string, and the underlying presto lib doesn't understand how to cast that by itself. Instead we have some types in presto_sql_types which add that functionality and I'm swapping out with those custom types to render it correctly.

@villebro and I discussed that a more robust fix might be further upstream, such that we actually already get the right types here, but the fix proposed (changing the types around L220ish in this file) didn't fix the issue and introduced additional problems, so this looks like the most viable fix for now. It probably requires a broader look at the surrounding code to see if something was missed which would eliminate the need for this. However you can also see in convert_dttm that other custom logic exists for these types, and other engines are also inheriting that custom behaviour from this one too (which is why modifying it had a wider impact)

@giftig giftig force-pushed the trino-preview-fix branch from 9e556b4 to 838a48a Compare May 18, 2023 08:26
@giftig
Copy link
Contributor Author

giftig commented May 18, 2023

@villebro @john-bodley comments addressed.

@villebro
Copy link
Member

@giftig can you rebase this PR? A test was broken on master when you last pushed, causing an unrelated test to fail.

The where_latest_partition_date method incorrectly handled column types
as strings, but they're provided as SQLA types instead.

Deal with the DATE and TIMESTAMP cases, which were being incorrectly
rendered in the query as a result of the above, and causing table
preview queries to fail.
@giftig giftig force-pushed the trino-preview-fix branch from 838a48a to d67b11e Compare May 19, 2023 08:21
@giftig
Copy link
Contributor Author

giftig commented May 19, 2023

@villebro I was wondering about that unrelated test failure, thought it must have just been something transient. I've pushed the rebase, let's see if that works now.

@villebro
Copy link
Member

@villebro I was wondering about that unrelated test failure, thought it must have just been something transient. I've pushed the rebase, let's see if that works now.

@giftig yeah, the unrelated broken test was fixed here: #24114

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restamping this - @john-bodley are you ok with the latest changes? CI is green so we're ready to go I can do a follow-up to address the remaining comments, but I'd like to get this in, as this is a fairly critical bug affecting all Trino users.

@villebro villebro merged commit 6159ced into apache:master May 19, 2023
@giftig giftig deleted the trino-preview-fix branch May 20, 2023 17:37
@mistercrunch mistercrunch added 🍒 2.1.1 🍒 2.1.2 🍒 2.1.3 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.0.0 labels Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.1.1 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M v2.1 🍒 2.1.1 🍒 2.1.2 🍒 2.1.3 🚢 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants