-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-15A] Presto types (PoC) #7682
[SIP-15A] Presto types (PoC) #7682
Conversation
fb8d600
to
aec7994
Compare
aec7994
to
c270b96
Compare
This sounds neat, as I tend to get confused by all the different conversion functions. I will take a closer look over the weekend. Btw, not that it matters for testing, but there's a conflict that needs rebasing. |
engine = 'presto' | ||
type_map = pyhive.sqlalchemy_presto._type_map | ||
|
||
time_grain_functions = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic would probably be refactored if we rolled this our globally.
@@ -349,13 +363,127 @@ def adjust_database_uri(cls, uri, selected_schema=None): | |||
return uri | |||
|
|||
@classmethod | |||
def convert_dttm(cls, target_type, dttm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic would probably be refactored if we rolled this our globally.
Codecov Report
@@ Coverage Diff @@
## master #7682 +/- ##
==========================================
- Coverage 65.76% 65.71% -0.06%
==========================================
Files 459 459
Lines 21909 21955 +46
Branches 2408 2408
==========================================
+ Hits 14409 14428 +19
- Misses 7379 7406 +27
Partials 121 121
Continue to review full report at Codecov.
|
return s | ||
|
||
if app.config['SIP_15_ENABLED']: | ||
raise TypeError(f"The '{type}' type is unsupported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be {self.type}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very exciting 👍 LGTM apart from possible typo. Is the migration plan to first introduce this in Presto, and once stability has been established through real-life testing, move stuff from PrestoEngineSpec
to BaseEngineSpec
and implement accordingly in all other *Spec
s?
@@ -38,26 +44,34 @@ | |||
|
|||
|
|||
class PrestoEngineSpec(BaseEngineSpec): | |||
import pyhive.sqlalchemy_presto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is essentially the same as importing in module scope, it'll fail if pyhive
isn't installed, should import in the methods
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue |
CATEGORY
Choose one
SUMMARY
This is very much a Proof-of-Concept (PoC) but I though there was merit in sharing early my thoughts as to how I would implement aspects of SIP-15A in relation to how we could think about type conversions in the future.
The premise is Superset needs to transform either Python types (such as
datetime
) or either a SQL type (DATE
,TIMESTAMP
, etc.), Python type (str
,int
,float
, etc.), or time grain (P1D
,P1W
etc). Rather than depending on methods likeconvert_dttm
,epoch_to_dttm
, etc. I felt it would make more sense to define an engine specific graph where the nodes are the various temporal types and the edges represents the SQL expression required to convert between the types. A shortest path algorithm is used to define the path (and thus SQL) between the types.I though the merits are:
from_unixtime
UDF which actually requires that the unix time be a float rather than an integer and thus one would define two edges with minimal SQL, i.e.,which would provide these mappings; (
BigInteger
,TIMESTAMP
) and (Float
,TIMESTAMP
).Note this PR does not address several aspects of SIP-15A including type inferencing and enforcing ISO 8601 encoding. Additionally if rolled out there would be additional major refactoring of the
BaseEngineSpec
andTableColumn
classes.TEST PLAN
N/A
ADDITIONAL INFORMATION
REVIEWERS
to: @agrawaldevesh @betodealmeida @michellethomas @mistercrunch @villebro