-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to redirect table reads from Hive to Iceberg #10173
Add support to redirect table reads from Hive to Iceberg #10173
Conversation
f4a3f99
to
708cdf5
Compare
708cdf5
to
d12502a
Compare
per @ssheikin's #8340 (comment), added @MiguelWeezardo as co-author. |
ceae60d
to
7b9d7ba
Compare
.map(propertyMetadata -> immutableEntry( | ||
propertyMetadata.getName(), | ||
session.getProperty(propertyMetadata.getName(), propertyMetadata.getJavaType()))) | ||
// The session properties collected here are used for events only. Filter out nulls to avoid problems with downstream consumers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we do not care about properties which are set to null
in events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not aware of any consumer of these events. This must be some extension point for someone.
The only usage i found looks like a dummy consumer, at least this is what the commit message says.
trino/plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveEventClient.java
Lines 31 to 38 in cea5842
log.debug("File created: query: %s, schema: %s, table: %s, partition: '%s', format: %s, size: %s, path: %s", | |
writeCompletedEvent.getQueryId(), | |
writeCompletedEvent.getSchemaName(), | |
writeCompletedEvent.getTableName(), | |
writeCompletedEvent.getPartitionName(), | |
writeCompletedEvent.getStorageFormat(), | |
writeCompletedEvent.getBytes(), | |
writeCompletedEvent.getPath()); |
Including null
values could be a breaking change to downstream even consumers (e.g. if the [Immutable]Map.copyOf(event.getSessionProperties())
. Converting values to Optional
would be a breaking change to. In the absence of more information, i deemed that omitting null values is the least breaking of all options.
import static org.testcontainers.utility.MountableFile.forHostPath; | ||
|
||
@TestsEnvironment | ||
public class EnvSinglenodeHiveRedirectionToIceberg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably does not matter here but I heard many times in the past that we should refrain from using singlenode envs as they provide worse test coverage than multinode ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding me. We have 27 or something singlenode environments. And yes, here it doesn't matter, as tested functionality is really coordinator-only. Let me keep it as is.
...no-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveRedirectionToIceberg.java
Outdated
Show resolved
Hide resolved
...no-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveRedirectionToIceberg.java
Outdated
Show resolved
Hide resolved
...no-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveRedirectionToIceberg.java
Outdated
Show resolved
Hide resolved
...no-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveRedirectionToIceberg.java
Outdated
Show resolved
Hide resolved
...no-product-tests/src/main/java/io/trino/tests/product/hive/TestHiveRedirectionToIceberg.java
Show resolved
Hide resolved
|
||
onTrino().executeQuery("ALTER TABLE " + hiveTableName + " ADD COLUMN some_new_column double"); | ||
|
||
// TODO: ALTER TABLE succeeded, but new column was not added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you create a github issue which lists missing functionalities?
And tag relevant TODOs with an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's necessary in this case. #8340 is going to fix this problems (once #8340 (comment) is resolved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to test SHOW STATS redirection too?
What about a non-default hive.timestamp-precision? Redirects will return different results.
Same for the use_column_name configs in tables which have different schema across partitions.
same |
Hive Connector redirects Iceberg table access to the configured Iceberg catalog. This change adds implementation of `HiveTableRedirectionsProvider`, plugging in Hive->Iceberg redirects and leveraging existing framework. This is based on work from several authors, mentioned at the end of commit message. As tests show, not all the statements support redirects properly and this needs to be followed up upon. Co-authored-by: Xingyuan Lin <[email protected]> Co-authored-by: Pratham Desai <[email protected]> Co-authored-by: Sasha Sheikin <[email protected]> Co-authored-by: Michał Ślizak <[email protected]> Co-authored-by: Łukasz Osipiuk <[email protected]>
7b9d7ba
to
8feb622
Compare
added Also, split the test class into two (#10173 (comment)). |
Do I have to use like |
Hive Connector redirects Iceberg table access to the configured
Iceberg catalog.
This change adds implementation of
HiveTableRedirectionsProvider
,plugging in Hive->Iceberg redirects and leveraging existing framework.
This is based on work from several authors. In particular, this is based
on #8340 with conflicts resolved, and retrofit to current APIs in Hive connector.
As tests show, not all the statements support redirects properly and
this needs to be followed up upon.