Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hidden $path column in Iceberg #11661

Merged
merged 1 commit into from
Apr 18, 2022
Merged

Conversation

osscm
Copy link
Contributor

@osscm osscm commented Mar 25, 2022

Description

Is this change a fix, improvement, new feature, refactoring, or other?

improvement

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

How would you describe this change to a non-technical end user or system administrator?

With this change , it will be possible to select the file path of the data file from the table. $path will be a hidden column.
Other connector's like Hive and DeltaLake are supporting it.
This is required when user wants to debug the data related issues, and can map the data file as well.

select col1, "$path" from foo

Related issues, pull requests, and links

Documentation

( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

( ) No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Iceberg
* Add support for hidden `$path` columns. ({issue}`11661`)

@cla-bot
Copy link

cla-bot bot commented Mar 25, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Manish Malhotra.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@osscm osscm force-pushed the iceberg-add-path-column branch from 45fbbdb to ad8d778 Compare March 25, 2022 06:19
@cla-bot
Copy link

cla-bot bot commented Mar 25, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Manish Malhotra.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@osscm osscm force-pushed the iceberg-add-path-column branch from ad8d778 to cf6812d Compare March 25, 2022 06:22
@cla-bot
Copy link

cla-bot bot commented Mar 25, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Manish Malhotra.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@osscm osscm force-pushed the iceberg-add-path-column branch from cf6812d to 8f59b0a Compare March 25, 2022 06:52
@cla-bot cla-bot bot added the cla-signed label Mar 25, 2022
@findepi findepi changed the title add hidden path column to the iceberg queries Add hidden $path column in Iceberg Mar 25, 2022
@findepi findepi added the enhancement New feature or request label Mar 25, 2022
@findinpath
Copy link
Contributor

@osscm please do squash the commits.

@osscm osscm force-pushed the iceberg-add-path-column branch from b2ce830 to 86df754 Compare March 25, 2022 09:58
@findinpath
Copy link
Contributor

I think that number comes from here actually: https://iceberg.apache.org/spec/#reserved-field-ids

Thanks @alexjo2144 for the heads up.

Here is the source code reference for reserved metadata columns:

https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/MetadataColumns.java

@alexjo2144 there are already a few reserved column ids. What should we do eventually for dealing with further metadata columns (e.g. : $file_size, $file_modified_time) ? Can we somehow avoiding dealing with field ids for Iceberg metadata columns in Trino?

@osscm
Copy link
Contributor Author

osscm commented Mar 29, 2022

I think that number comes from here actually: https://iceberg.apache.org/spec/#reserved-field-ids

Thanks @alexjo2144 for the heads up.

Here is the source code reference for reserved metadata columns:

https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/MetadataColumns.java

@alexjo2144 there are already a few reserved column ids. What should we do eventually for dealing with further metadata columns (e.g. : $file_size, $file_modified_time) ? Can we somehow avoiding dealing with field ids for Iceberg metadata columns in Trino?

I think that number comes from here actually: https://iceberg.apache.org/spec/#reserved-field-ids

yes @alexjo2144 @findinpath I took the reference from https://github.com/apache/iceberg/blob/827b6c86108eec7b6de25f61022c7f8b5dda481c/core/src/main/java/org/apache/iceberg/MetadataColumns.java#L34-L42 (thanks to @RussellSpitzer for the discussion)

So, for the other hidden columns, we can go further Integer.Max_VALUE - 1
like $file_size = Integer.Max_VALUE - 2 and $file_modified_time = Integer.Max_VALUE - 3

@findinpath
Copy link
Contributor

So, for the other hidden columns, we can go further Integer.Max_VALUE - 1
like $file_size = Integer.Max_VALUE - 2 and $file_modified_time = Integer.Max_VALUE - 3

In case we'll need extra columns, I think it would probably be a good idea to add them first to iceberg project to avoid eventual clashes with reserved columns which may appear in the next Iceberg releases.

@osscm osscm force-pushed the iceberg-add-path-column branch from 86df754 to 4bee72a Compare March 30, 2022 09:53
@cla-bot
Copy link

cla-bot bot commented Mar 30, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Manish Malhotra.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot bot removed the cla-signed label Mar 30, 2022
@osscm osscm force-pushed the iceberg-add-path-column branch from 4bee72a to b703273 Compare March 30, 2022 09:56
@cla-bot cla-bot bot added the cla-signed label Mar 30, 2022
@cla-bot
Copy link

cla-bot bot commented Mar 30, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Manish Malhotra.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot bot removed the cla-signed label Mar 30, 2022
@osscm osscm force-pushed the iceberg-add-path-column branch from b57d7bb to 71cd5d0 Compare March 30, 2022 10:22
@cla-bot cla-bot bot added the cla-signed label Mar 30, 2022
@cla-bot
Copy link

cla-bot bot commented Apr 4, 2022

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Manish Malhotra.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@osscm osscm force-pushed the iceberg-add-path-column branch 4 times, most recently from e7c802f to cd45fe2 Compare April 14, 2022 06:40
Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed docs content only. That part looks good to me ;-)

@osscm
Copy link
Contributor Author

osscm commented Apr 14, 2022

thanks @mosabua for quick review!

@osscm osscm force-pushed the iceberg-add-path-column branch 4 times, most recently from 903815a to 6f773b3 Compare April 15, 2022 08:57
@osscm
Copy link
Contributor Author

osscm commented Apr 15, 2022

Thanks a lot @findinpath, @alexjo2144 , @ebyhr, @mosabua for all the help and reviews!

@osscm
Copy link
Contributor Author

osscm commented Apr 15, 2022

Thanks @findepi for all the help!
Initiating the reviews and spending time!

Is it possible to merge this PR now? thanks

@osscm osscm force-pushed the iceberg-add-path-column branch 4 times, most recently from 27f4b84 to a75f17b Compare April 16, 2022 07:35
@osscm osscm force-pushed the iceberg-add-path-column branch from a75f17b to 16efb54 Compare April 16, 2022 07:35
@ebyhr ebyhr merged commit 146e82a into trinodb:master Apr 18, 2022
@ebyhr
Copy link
Member

ebyhr commented Apr 18, 2022

Merged, thanks!

@github-actions github-actions bot added this to the 378 milestone Apr 18, 2022
@ebyhr ebyhr mentioned this pull request Apr 18, 2022
@osscm
Copy link
Contributor Author

osscm commented Apr 18, 2022

Merged, thanks!

thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed docs enhancement New feature or request
Development

Successfully merging this pull request may close these issues.

Add $path hidden column in Iceberg connector
6 participants