Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comment columns to information_schema.tables #13705

Closed
hovaesco opened this issue Aug 17, 2022 · 8 comments
Closed

Add comment columns to information_schema.tables #13705

hovaesco opened this issue Aug 17, 2022 · 8 comments

Comments

@hovaesco
Copy link
Member

hovaesco commented Aug 17, 2022

Currently, there is no way to get comments from tables and columns tables from information_schema. For instance, Snowflake exposes these comment columns in respective tables in information_schema. It would be very useful for integrations like dbt which requires it to fully support persisting docs functionality.

@hashhar
Copy link
Member

hashhar commented Aug 17, 2022

cc: @martint

We do expose this under system.jdbc.tables it seems. Not sure why there is a difference.

@findepi
Copy link
Member

findepi commented Aug 23, 2022

Column comments are available

.hiddenColumn("comment", createUnboundedVarcharType()) // non-standard

Table comments (remarks) are not available in information_schema.tables.
Updating issue description to reflect this.

@findepi findepi changed the title Add comment columns to information_schema tables Add comment columns to information_schema.tables Aug 23, 2022
@findepi
Copy link
Member

findepi commented Aug 23, 2022

The reason why information_schema.tables has table_comment column is some tools compatibility

.hiddenColumn("table_comment", createUnboundedVarcharType()) // MySQL compatible

It would be very expensive to populate this fields, hence it's not populated (all NULLs).

That's why we have them in a separate table system.metadata.table_comments.

It would be very useful for integrations like dbt which requires it to fully support persisting docs functionality.

What's the use-case which requires dbt to list tables along with the comments?
Can it join to system.metadata.table_comments to get them?
Note that in general this is going to be prohibitively expensive, so this should be done for one table (or at most for a handful of tables) at a time.

@martint
Copy link
Member

martint commented Aug 23, 2022

We do expose this under system.jdbc.tables it seems. Not sure why there is a difference.

The schema of information_schema tables and views is defined by the SQL specification. For extensions, we have dedicated tables under the system catalog. system.jdbc.tables, in particular, is designed to expose all the information needed by the JDBC driver.

@hovaesco
Copy link
Member Author

Thanks for pointing out on hidden comment columns from information_schema.columns.

Use case is a functionality called persist_docs which saves table-level and column-level comments which are then used in dbt docs.

Technically speaking, it's feasible to join system.metadata.table_comments with information_schema.tables. However, I'm afraid about backwards compatibility since system catalog could require additional privileges. A query to get all details to generate docs currently uses only information_schema.

From the description above it looks like that it's going to be a costly operation but the best solution from dbt-trino perspective would be to populate a table_comment column from information_schema.tables.

@findepi
Copy link
Member

findepi commented Aug 24, 2022

populate a table_comment column from information_schema.tables.

This is no-go from perf perspective.

For example, listing tables in Hive is O(1) operation (counting the HMS calls, likely the dominant factor).
Listing tables with comments is a O(n) operation (where n is number of tables).
In the past we've seen that O(n) for information_schema.tables is prohibitively expensive.

@hovaesco
Copy link
Member Author

Thanks for detailed explanation. I'm good to close the issue.

@hovaesco hovaesco closed this as not planned Won't fix, can't repro, duplicate, stale Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants