Bulk fetch all columns from all tables in JDBC connectors #22241

hashhar · 2024-06-03T09:33:03Z

Description

Before this change, when listing table columns, JDBC connectors would first list tables and then list columns of a table. Thus, when serving Trino's information_schema.columns or system.jdbc.columns, we would make O(#tables) calls to the remote database.

With this change, we utilize remote database's bulk column listing facilities to satisfy Trino's bulk column listing requests. This can be viewed as "information_schema.columns pass-through", although this works for both Trino's information_schema.columns and Trino's system.jdbc.columns
(io.trino.jdbc.TrinoDatabaseMetaData.getColumns), and does not use remote database's information_schema.columns directly. Instead, the commit leverages the fact that DatabaseMetaData.getColumns typically used to get columns of a table can be used without a table filter, and then it gets all columns from all tables.

The bulk retrieval is supported for selected JDBC connectors, and by default is not supported (requires JdbcClient changes).

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# MariaDB, MySQL, SingleStore, Redshift
* Improve performance of listing table columns. ({issue}`issuenumber`)

Before this change, when listing table columns, JDBC connectors would first list tables and then list columns of a table. Thus, when serving Trino's `information_schema.columns` or `system.jdbc.columns`, we would make O(#tables) calls to the remote database. With this change, we utilize remote database's bulk column listing facilities to satisfy Trino's bulk column listing requests. This can be viewed as "`information_schema.columns` pass-through", although this works for both Trino's `information_schema.columns` and Trino's `system.jdbc.columns` (`io.trino.jdbc.TrinoDatabaseMetaData.getColumns`), and does not use remote database's `information_schema.columns` directly. Instead, the commit leverages the fact that `DatabaseMetaData.getColumns` typically used to get columns of a table can be used without a table filter, and then it gets all columns from all tables. The bulk retrieval is supported for selected JDBC connectors, and by default is not supported (requires `JdbcClient` changes). Co-authored-by: Ashhar Hasan <[email protected]>

hashhar requested review from findepi and ebyhr June 3, 2024 09:33

cla-bot bot added the cla-signed label Jun 3, 2024

findepi approved these changes Jun 3, 2024

View reviewed changes

ebyhr approved these changes Jun 3, 2024

View reviewed changes

hashhar merged commit 1ac1ee1 into trinodb:master Jun 4, 2024
62 checks passed

hashhar deleted the hashhar/bulk-fetch-all-columns branch June 4, 2024 08:59

github-actions bot added this to the 450 milestone Jun 4, 2024

This was referenced Jun 5, 2024

Implement getAllTableColumns in Snowflake #22264

Closed

Disable testBulkColumnListingOptions in Snowflake #22265

Merged

colebow mentioned this pull request Jun 7, 2024

Add Trino 450 release notes #22327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk fetch all columns from all tables in JDBC connectors #22241

Bulk fetch all columns from all tables in JDBC connectors #22241

hashhar commented Jun 3, 2024

Bulk fetch all columns from all tables in JDBC connectors #22241

Bulk fetch all columns from all tables in JDBC connectors #22241

Conversation

hashhar commented Jun 3, 2024

Description

Release notes