Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polymorphic table functions #1839

Closed
54 of 59 tasks
findepi opened this issue Oct 23, 2019 · 8 comments
Closed
54 of 59 tasks

Polymorphic table functions #1839

findepi opened this issue Oct 23, 2019 · 8 comments
Assignees
Labels
roadmap Top level issues for major efforts in the project

Comments

@findepi
Copy link
Member

findepi commented Oct 23, 2019

Add support for polymorphic table functions (https://www.iso.org/standard/69776.html)

related PRs:
#11336
#12350
#12338

This document was moved from: #11937
It serves the following purposes:

  1. Decompose, prioritise and schedule the work.
  2. Provide visibility into state of the task.
  3. Provide a place for discussion and questions.

Definitions

A Table Function is a function which returns a relation, as opposed to a scalar function which returns a single value.
A Polymorphic Table Function (PTF) is a Table Function which fulfils at least one of the following conditions:

  • the row type of the returned table is not known at the time when the function is created
  • the function takes a table as an argument, whose row type is not known when the function is created

Specifically, the output type of the Polymorphic Table Function may depend on the arbitrary table passed as an argument.

Scope of the task

The scope of the task is to provide full support for Table Functions, including Polymorphic Table Functions.

Subtasks

  • Add language support for PTF invocation

    • grammar
    • AST representation
    • tests
  • Add SPI support for declaring PTFs by connectors

    • The main interface: ConnectorTableFunction
      • analyze(): a method for required and custom analysis
      • Analysis: a class to provide the required and custom analysis results to Trino Analyzer
      • ConnectorTableFunctionHandle: an interface for passing the custom analysis results
      • fulfil(): a method to provide the function logic to Trino
    • Classes representing argument declarations, and returned type declaration
    • Classes representing the actual passed arguments
  • Add mechanism for registering PTFs

    • Add TableFunctionRegistry with table function resolution
      • Use path resolution
    • Add register - unregister mechanisms
  • Analyze PTF invocation

    • scalar arguments
    • DESCRIPTOR arguments
    • TABLE arguments
    • tests
  • Plan PTF invocation

    • Add a dedicated PlanNode: TableFunctionNode
    • Implement relevant PlanVisitors
    • Explain
  • Execute PTF through pushdown to connector

    • Add apply() methods
    • Add RewriteTableFunctionToTableScan Optimizer rule
  • Add example PTF implementations: query pass-through

    • query function for JDBC connectors:
      • Druid
      • MemSql
      • MySql
      • Oracle
      • PostgreSql
      • Redshift
      • SqlServer
      • MariaDB
      • SingleStore
    • raw_query function for Elasticsearch connector
    • tests

Achieved functionality

Currently, any PTF can be supported, which can be entirely realised by a connector. The connector can "capture" the PTF invocation, and replace it with a ConnectorTableHandle, which represents the PTF result.

Following work

To provide full support for PTF, as in the SQL standard, we need to:

  • Support TABLE arguments (starting from the Analyzer)
  • Support DESCRIPTOR arguments (starting from the Analyzer)
  • EXPLAIN: think of rendering TABLE arguments, which are both function arguments, and PlanNode sources
  • Support optimizations of plans involving TableFunctionNode: column pruning, etc
  • Implement COPARTITION as JOIN
  • Choose distribution of sources, regarding size, number of sources, and row/set semantics (see: DetermineJoinDistributionType)
  • Handle TableFunctionNode in AddExchanges / AddLocalExchanges: realise the partitioning and ordering of input tables respecting their actual properties. Also, consider the output properties of TableFunctionNode.
  • Figure out the interfaces between the PTF logic (the fulfil() method), and the Operator: in what form data will be provided to the PTF logic, and in what form the results will be returned to the Operator
  • Add the Operator:
    • arbitrary number of sources: 0 to n
    • final partitioning and sorting of the sources (see: WindowOperator)
    • invoking the PTF logic on cartesian product of the partitions from all the sources (see: Nested Loops for implementation choices)
    • appending the pass-through columns and the partitioning columns from the source tables

Follow-ups

Possible improvements to be implemented if desired:

  • Optimize hash generation for TableFunctionProcessorNode with HashGenerationOptimizer. notes: requires outputting the hash symbol; hash symbol has no marker symbol.
  • Prune table function's unreferenced proper outputs so that the function does not produce them. Requires new api to negotiate with the function.
  • Prune table function's pass-through columns. If all pass-through columns from a certain source are unreferenced, the function could skip producing the pass-through index column for such source. Requires new api to negotiate with the function.
  • Decompose a table function with set semantics (like aggregations are decomposed into partial and final). With set semantics, each partition is processed on a single node. When no partitioning is specified, the whole input constitutes a single partition which is processed on a single node. Decomposition would allow to split partitions into streams and perform partial computations in parallel. Requires new interfaces to determine if and how the function can be decomposed.
  • Enhance user and developer documentation with examples
@emerson-zhao
Copy link

How's it going now?

@buremba
Copy link
Member

buremba commented Sep 13, 2021

The issue looks old, is there any interest in implementing table functions at the moment?

It would be great if we can add the support for syntax as a start.

@findepi
Copy link
Member Author

findepi commented May 10, 2022

#11336 is just merged

@zhicwu
Copy link
Contributor

zhicwu commented Jun 11, 2022

Quick question: will mutation be supported? Sometimes we issue multi-statement query(or just a stored procedure) on target database, where the last statement is for updating.

I noticed Query not supported: ResultSetMetaData not available for query in unit test, but I'm not sure if this will be addressed in the future or it's by design - perhaps you can fixed metadata(e.g. affected_rows Long) for non-query statements?

@findepi
Copy link
Member Author

findepi commented Jun 13, 2022

Quick question: will mutation be supported? Sometimes we issue multi-statement query(or just a stored procedure) on target database, where the last statement is for updating.

@zhicwu you may want to see #12322

I noticed Query not supported: ResultSetMetaData not available for query in unit test, but I'm not sure if this will be addressed in the future or it's by design

#12325 is by design for queries reading data.

@zhicwu
Copy link
Contributor

zhicwu commented Jun 13, 2022

Thanks @findepi. Does it make sense to you to support PTF by both embedded jdbc connector and external bridge server? I think clickhouse-jdbc-bridge is something very similar to the latter. It's just an inmature thought, but maybe I can rewrite jdbc bridge to benefit both sides.

@findepi
Copy link
Member Author

findepi commented Jun 17, 2024

The feature is generally available. The various follow-ups possible, or bugs like #20398 are tracked as separate issues.

awesome work @kasiafi

@findepi findepi closed this as completed Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Top level issues for major efforts in the project
Development

No branches or pull requests

7 participants