Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-15A] Presto types (PoC) #7682

Conversation

john-bodley
Copy link
Member

CATEGORY

Choose one

  • Bug Fix
  • Enhancement (new features, refinement)
  • Refactor
  • Add tests
  • Build / Development Environment
  • Documentation

SUMMARY

This is very much a Proof-of-Concept (PoC) but I though there was merit in sharing early my thoughts as to how I would implement aspects of SIP-15A in relation to how we could think about type conversions in the future.

The premise is Superset needs to transform either Python types (such as datetime) or either a SQL type (DATE, TIMESTAMP, etc.), Python type (str, int, float, etc.), or time grain (P1D, P1W etc). Rather than depending on methods like convert_dttm, epoch_to_dttm, etc. I felt it would make more sense to define an engine specific graph where the nodes are the various temporal types and the edges represents the SQL expression required to convert between the types. A shortest path algorithm is used to define the path (and thus SQL) between the types.

I though the merits are:

  • Helps to ensure all the mappings reside in the same place. Note it's still somewhat fragmented due the constraints on the existing API.
  • Removes redundancy, i.e., in Hive there's a bug in the from_unixtime UDF which actually requires that the unix time be a float rather than an integer and thus one would define two edges with minimal SQL, i.e.,
graph = nx.DiGraph()
graph.add_edge(BigInteger, Float, sql='CAST({col} AS DOUBLE)')
graph.add_edge(Float, TIMESTAMP, sql='CAST({col} AS TIMESTAMP)')

which would provide these mappings; (BigInteger, TIMESTAMP) and (Float, TIMESTAMP).

Note this PR does not address several aspects of SIP-15A including type inferencing and enforcing ISO 8601 encoding. Additionally if rolled out there would be additional major refactoring of theBaseEngineSpec and TableColumn classes.

TEST PLAN

N/A

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

REVIEWERS

to: @agrawaldevesh @betodealmeida @michellethomas @mistercrunch @villebro

@john-bodley john-bodley force-pushed the john-bodley--sip-15-poc-presto-types branch from fb8d600 to aec7994 Compare June 10, 2019 23:17
@john-bodley john-bodley force-pushed the john-bodley--sip-15-poc-presto-types branch from aec7994 to c270b96 Compare June 10, 2019 23:30
@villebro
Copy link
Member

This sounds neat, as I tend to get confused by all the different conversion functions. I will take a closer look over the weekend. Btw, not that it matters for testing, but there's a conflict that needs rebasing.

engine = 'presto'
type_map = pyhive.sqlalchemy_presto._type_map

time_grain_functions = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic would probably be refactored if we rolled this our globally.

@@ -349,13 +363,127 @@ def adjust_database_uri(cls, uri, selected_schema=None):
return uri

@classmethod
def convert_dttm(cls, target_type, dttm):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic would probably be refactored if we rolled this our globally.

@codecov-io
Copy link

codecov-io commented Jun 14, 2019

Codecov Report

Merging #7682 into master will decrease coverage by 0.05%.
The diff coverage is 40.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7682      +/-   ##
==========================================
- Coverage   65.76%   65.71%   -0.06%     
==========================================
  Files         459      459              
  Lines       21909    21955      +46     
  Branches     2408     2408              
==========================================
+ Hits        14409    14428      +19     
- Misses       7379     7406      +27     
  Partials      121      121
Impacted Files Coverage Δ
superset/config.py 94.04% <100%> (+0.03%) ⬆️
superset/connectors/sqla/models.py 81.98% <33.33%> (-0.42%) ⬇️
superset/db_engine_specs/presto.py 67.17% <40%> (-2.54%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0d12243...512d601. Read the comment docs.

return s

if app.config['SIP_15_ENABLED']:
raise TypeError(f"The '{type}' type is unsupported.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be {self.type}?

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting 👍 LGTM apart from possible typo. Is the migration plan to first introduce this in Presto, and once stability has been established through real-life testing, move stuff from PrestoEngineSpec to BaseEngineSpec and implement accordingly in all other *Specs?

@@ -38,26 +44,34 @@


class PrestoEngineSpec(BaseEngineSpec):
import pyhive.sqlalchemy_presto
Copy link
Member

@mistercrunch mistercrunch Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is essentially the same as importing in module scope, it'll fail if pyhive isn't installed, should import in the methods

@stale
Copy link

stale bot commented Sep 21, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Sep 21, 2019
@stale stale bot closed this Sep 28, 2019
@john-bodley john-bodley reopened this Oct 2, 2019
@stale stale bot removed the inactive Inactive for >= 30 days label Oct 2, 2019
@john-bodley john-bodley changed the title [PoC] SIP-15 Presto types [SIP-15A] Presto types (PoC) Oct 16, 2019
@stale
Copy link

stale bot commented Dec 15, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

@stale stale bot added the inactive Inactive for >= 30 days label Dec 15, 2019
@stale stale bot closed this Dec 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive Inactive for >= 30 days size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants