Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg Connector #1324

Closed
78 of 93 tasks
lxynov opened this issue Aug 19, 2019 · 17 comments
Closed
78 of 93 tasks

Iceberg Connector #1324

lxynov opened this issue Aug 19, 2019 · 17 comments
Labels
enhancement New feature or request roadmap Top level issues for major efforts in the project

Comments

@lxynov
Copy link
Member

lxynov commented Aug 19, 2019

TODOs for the Iceberg Connector

@findepi findepi added the roadmap Top level issues for major efforts in the project label Aug 19, 2019
@manishmalhotrawork
Copy link

manishmalhotrawork commented Aug 19, 2019

@linxingyuan1102 should it also be a TODO for:

"Iceberg table should also allow to give table location?"
as its possible that, from same presto cluster I want to create tables pointing to different S3 account/clusters.

@manishmalhotrawork
Copy link

manishmalhotrawork commented Jan 28, 2020

@lxynov #2660
added issue for ToDo
"Needs correctness tests for partition pruning. (also validate the pushdown is happening by checking the query plans?)"
Can you please link the issue with the todo.

@lxynov
Copy link
Member Author

lxynov commented Jan 29, 2020

@manishmalhotrawork sure, done

Just a note, partition pruning in Iceberg is tricky because of partition spec evolution. We need more thoughts and discussion on this.

@AbdullaevAPo
Copy link

AbdullaevAPo commented Oct 13, 2020

@lxynov Is it planned to add support of hdfs only iceberg tables (like in spark https://iceberg.apache.org/spark/ &spark.sql.catalog.hadoop_prod.type = hadoop ) ?

@pPanda-beta
Copy link

@lxynov
any update on what @AbdullaevAPo asked?
#1324 (comment)

This feature is a blocker to perfect read-write isolation, having a hive-metastore as a common point of contact between spark and presto is not a scalable solution.

@pan3793
Copy link
Member

pan3793 commented Mar 15, 2021

Is there any plan for supporting HadoopCatalog?

@caneGuy
Copy link
Contributor

caneGuy commented Jun 25, 2021

https://github.com/trinodb/trino/pull/6977/files @pan3793 i think this is related work

@KarlManong
Copy link
Contributor

Will you support table configuration properties ?

@RomantsovArtur
Copy link

Hey.

I don't see in the list support of the UPDATE or CHANGE statement for ALTER TABLE. It would be very handy, since data evolves a lot.

I can see that the functionality exists in the IcerbergAPI: https://iceberg.apache.org/javadoc/master/org/apache/iceberg/UpdateSchema.html

I might be missing something.

@bitsondatadev
Copy link
Member

https://iceberg.apache.org/javadoc/master/org/apache/iceberg/UpdateSchema.html

@RomantsovArtur, for schema evolution in Trino you can use ALTER TABLE <table-name> ADD|DROP COLUMN ...

See my section on Schema Evolution in this blog: https://blog.starburst.io/trino-on-ice-ii-in-place-table-evolution-and-cloud-compatibility-with-iceberg

If you're looking for updates for partition evolution, we are already tracking #7580 here.

Feel free to reach out to me on Trino Slack if you're looking for something specific.

@RomantsovArtur
Copy link

@bitsondatadev Thank you for your reply!

We are looking for some logic like:
ALTER TABLE table_name CHANGE [COLUMN] col_name column_new_type

As you can see from the link I provided above - Iceberg API is available, but, unfortunately, Trino does not support this logic.

I read the doc you attached. Thank you for the beautiful blog post. The use case we are trying to achieve is the case when you have a table that is constantly written to and read by different clients, and we want to have an atomic type update rather than

  • Add a new column to the table
  • Update new column on converted value from the old column (I think Trino doesn't support update yet)
  • Rename old column or drop the old column
  • Rename the new column to the old column name

Please note that I'm speaking about the case when we need to evolve many tables on a regular basics. Some are very huge, 100 b + records.

@bitsondatadev
Copy link
Member

@bitsondatadev Thank you for your reply!

We are looking for some logic like: ALTER TABLE table_name CHANGE [COLUMN] col_name column_new_type
...

Made a new issue for this. First step is to add the syntax. Then this should be easy to hook up to Iceberg.

@RomantsovArtur
Copy link

Thank you for the quick reply! Looks great 🚀

@rimolive
Copy link

Posting here as it seems the central location to enable full support for iceberg as a Trino connector: Is there already support for rewrite_data_files procedure?

@nicor88
Copy link

nicor88 commented Mar 22, 2022

Row Level Delete where added to Iceberg, this means we that DELETE/UPSERT/MERGO INTO are unlock.
I'm wondering when this feature will be included in trino connector (will it be cover by #10758)??

@findepi
Copy link
Member

findepi commented Jan 10, 2024

We don't use this issue for tracking Iceberg work anymore, so let me close it.
There will always be some work items within such a broad area as Iceberg.
Existing tickets can be found with

@findepi findepi closed this as completed Jan 10, 2024
@bitsondatadev
Copy link
Member

We don't use this issue for tracking Iceberg work anymore, so let me close it.

There will always be some work items within such a broad area as Iceberg.

Existing tickets can be found with

That being said, I really appreciate all the effort that was put into maintaining this initial roadmap.

That said, we should align in how we view larger efforts!

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap Top level issues for major efforts in the project
Development

No branches or pull requests