Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk fetch all columns from all tables in JDBC connectors #22241

Merged
merged 1 commit into from
Jun 4, 2024

Conversation

hashhar
Copy link
Member

@hashhar hashhar commented Jun 3, 2024

Description

Before this change, when listing table columns, JDBC connectors would first list tables and then list columns of a table. Thus, when serving Trino's information_schema.columns or system.jdbc.columns, we would make O(#tables) calls to the remote database.

With this change, we utilize remote database's bulk column listing facilities to satisfy Trino's bulk column listing requests. This can be viewed as "information_schema.columns pass-through", although this works for both Trino's information_schema.columns and Trino's system.jdbc.columns
(io.trino.jdbc.TrinoDatabaseMetaData.getColumns), and does not use remote database's information_schema.columns directly. Instead, the commit leverages the fact that DatabaseMetaData.getColumns typically used to get columns of a table can be used without a table filter, and then it gets all columns from all tables.

The bulk retrieval is supported for selected JDBC connectors, and by default is not supported (requires JdbcClient changes).

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# MariaDB, MySQL, SingleStore, Redshift
* Improve performance of listing table columns. ({issue}`issuenumber`)

Before this change, when listing table columns, JDBC connectors would
first list tables and then list columns of a table. Thus, when serving
Trino's `information_schema.columns` or `system.jdbc.columns`, we would
make O(#tables) calls to the remote database.

With this change, we utilize remote database's bulk column listing
facilities to satisfy Trino's bulk column listing requests. This can be
viewed as "`information_schema.columns` pass-through", although this
works for both Trino's `information_schema.columns` and Trino's
`system.jdbc.columns`
(`io.trino.jdbc.TrinoDatabaseMetaData.getColumns`), and does not use
remote database's `information_schema.columns` directly. Instead, the
commit leverages the fact that `DatabaseMetaData.getColumns` typically
used to get columns of a table can be used without a table filter, and
then it gets all columns from all tables.

The bulk retrieval is supported for selected JDBC connectors, and by
default is not supported (requires `JdbcClient` changes).

Co-authored-by: Ashhar Hasan <[email protected]>
@hashhar hashhar requested review from findepi and ebyhr June 3, 2024 09:33
@cla-bot cla-bot bot added the cla-signed label Jun 3, 2024
@hashhar hashhar merged commit 1ac1ee1 into trinodb:master Jun 4, 2024
62 checks passed
@hashhar hashhar deleted the hashhar/bulk-fetch-all-columns branch June 4, 2024 08:59
@github-actions github-actions bot added this to the 450 milestone Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants