Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table Functions support for Trino #11937

Closed
54 tasks
kasiafi opened this issue Apr 13, 2022 · 1 comment
Closed
54 tasks

Table Functions support for Trino #11937

kasiafi opened this issue Apr 13, 2022 · 1 comment
Assignees

Comments

@kasiafi
Copy link
Member

kasiafi commented Apr 13, 2022

This issue serves the following purposes:

  1. Decompose, prioritise and schedule the work.
  2. Provide visibility into state of the task.
  3. Provide a place for discussion and questions.

related issue: #1839
related PR: #11336

Definitions

A Table Function is a function which returns a relation, as opposed to a scalar function which returns a single value.
A Polymorphic Table Function (PTF) is a Table Function which fulfils at least one of the following conditions:

  • the row type of the returned table is not known at the time when the function is created
  • the function takes a table as an argument, whose row type is not known when the function is created

Specifically, the output type of the Polymorphic Table Function may depend on the arbitrary table passed as an argument.

Scope of the task

The scope of the task is to provide full support for Table Functions, including Polymorphic Table Functions.

Subtasks

  • Add language support for PTF invocation

    • grammar (in review)
    • AST representation (in review)
    • tests (in review)
  • Add SPI support for declaring PTFs by plugins / connectors

    • The main interface: ConnectorTableFunction (in review)
      • analyze(): a method for required and custom analysis (in review)
      • Analysis: a class to provide the required and custom analysis results to Trino Analyzer (in review)
      • InvocationHandle: an interface for passing the custom analysis results (in review)
      • fulfil(): a method to provide the function logic to Trino
    • Classes representing argument declarations, and returned type declaration (in review)
    • Classes representing the actual passed arguments (in review)
  • Add mechanism for registering PTFs

    • Add TableFunctionRegistry with table function resolution (in review)
      • Prepare for path resolution when we have it (in review)
    • Respect differences between connector-provided and plugin-provided PTFs through separate interfaces (in review)
    • Add register - unregister mechanisms (in review)
  • Analyze PTF invocation

    • scalar arguments (in review)
    • DESCRIPTOR arguments (not yet supported)
    • TABLE arguments (not yet supported)
    • tests
  • Plan PTF invocation

    • Add a dedicated PlanNode: TableFunctionNode (in review)
    • Implement relevant PlanVisitors (in review)
    • Explain
  • Execute PTF through pushdown to connector

    • Add apply() methods (in review)
    • Add RewriteTableFunctionToTableScan Optimizer rule (in review)
    • Unit test for the rule
  • Add example PTF implementations: query pass-through

    • remote_query function for JDBC connectors:
      • Druid (in review)
      • MemSql (in review)
      • MySql (in review)
      • Oracle (in review)
      • PostgreSql (in review)
      • Redshift (in review)
      • SqlServer (in review)
    • remote_query function for Elasticsearch connector (in review)
    • tests

Achieved functionality

Currently, any PTF can be supported, which can be entirely realised by a connector. The connector can "capture" the PTF invocation, and replace it with a ConnectorTableHandle, which represents the PTF result.

Following work

To provide full support for PTF, as in SQL standard, we need to:

  • Support TABLE arguments (starting from the Analyzer)
  • Support DESCRIPTOR arguments (starting from the Analyzer)
  • EXPLAIN: think of rendering TABLE arguments, which are both function arguments, and PlanNode sources
  • Support optimizations of plans involving TableFunctionNode: column pruning, etc
  • Implement COPARTITION as JOIN
  • Choose distribution of sources, regarding size, number of sources, and row/set semantics (see: DetermineJoinDistributionType)
  • Handle TableFunctionNode in AddExchanges / AddLocalExchanges: realise the partitioning and ordering of input tables respecting their actual properties. Also, consider the output properties of TableFunctionNode.
  • Figure out the interfaces between the PTF logic (the fulfil() method), and the Operator: in what form data will be provided to the PTF logic, and in what form the results will be returned to the Operator
  • Add the Operator:
    • arbitrary number of sources: 0 to n
    • final partitioning and sorting of the sources (see: WindowOperator)
    • invoking the PTF logic on cartesian product of the partitions from all the sources (see: Nested Loops for implementation choices)
    • appending the pass-through columns and the partitioning columns from the source tables
@kasiafi kasiafi self-assigned this Apr 13, 2022
@kasiafi
Copy link
Member Author

kasiafi commented May 15, 2022

Closing in favor of #1839

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant