Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to redirect table reads from Hive to Iceberg #8340

Closed
wants to merge 3 commits into from

Conversation

phd3
Copy link
Member

@phd3 phd3 commented Jun 21, 2021

Hive plugin changes on top of #7606

  • Redirect iceberg table reads from hive catalog to an Iceberg catalog when configured.
  • Add engine side restrictions for blocking modification operations for redirected tables.

Fixes #4442

@phd3 phd3 marked this pull request as draft June 21, 2021 23:15
@phd3 phd3 force-pushed the iceberg-redirect-plugin-change branch from 4d6f0f9 to a303baa Compare June 21, 2021 23:48
@cla-bot cla-bot bot added the cla-signed label Jun 21, 2021
@findepi findepi requested review from raunaqmorarka and sopel39 June 22, 2021 09:30
Comment on lines 3123 to 3124
throw new TrinoException(NOT_SUPPORTED, "Hive Connector doesn't support modification operations (write, DDL, comment, statistics collection, set authorization) " +
"on Iceberg tables when redirection is enabled");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that redirects on modification operations are a bad idea?
then why do we support them at all?

if they are a good idea in general, why would we want to opt out here?

Copy link
Member Author

@phd3 phd3 Jul 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi

Do you mean that redirects on modification operations are a bad idea?

#7606 (comment)

then why do we support them at all?

does it look better with Fail modifications for redirected tables in engine commit? If so, I can put it as the first commit. Note that Hive connector check is still required because of procedures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it look better with Fail modifications for redirected tables in engine commit?

I am not very fluent with this code, but if this implements @electrum 's thinking #7606 (comment), it should go in separate PR and to be debated separately. There hopefully is nothing special about redirects in Iceberg or Hive connector to warrant DDL-specific checks in the connector code (except, maybe, procedures -- which ones?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to do it in a separate PR, was trying to gather feedback if exceptions thrown in hive redirection tests feel more natural with this change. and yes, that commit doesn't have anything specific to hive connector.

hive connector checks become sort of "illegal state checks" in non-procedure calls, but wouldn't hurt to still keep proper error message there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but wouldn't hurt to still keep proper error message there.

except that it suggests -- to the future reader -- that we're coding some connector-specific behavior.

(and I do care about future readers quite a lot:)

Copy link
Member Author

@phd3 phd3 Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How about changing all connector checks to checkState, and adding special handling for procedures to throw TrinoException?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or TrinoException here, but with a comment. up to you

@findepi findepi requested a review from losipiuk June 23, 2021 12:38
@phd3 phd3 force-pushed the iceberg-redirect-plugin-change branch from a303baa to 4cd0b46 Compare July 8, 2021 21:52
phd3 added 2 commits July 11, 2021 11:53
Queries on information_schema.tables failed when the filters pointed
to a specific table, and the table was redirected.
Hive Connector redirects Iceberg table reads to the configured
Iceberg catalog.

Co-authored by: Xingyuan Lin <[email protected]>
@phd3 phd3 force-pushed the iceberg-redirect-plugin-change branch from 4cd0b46 to bb368d1 Compare July 11, 2021 15:55
@phd3
Copy link
Member Author

phd3 commented Jul 12, 2021

Unrelated #8477 and #8345

@phd3 phd3 marked this pull request as ready for review July 12, 2021 13:51
@phd3
Copy link
Member Author

phd3 commented Jul 12, 2021

@findepi @losipiuk @raunaqmorarka @electrum this is ready for review.

@findepi
Copy link
Member

findepi commented Nov 8, 2021

@phd3 can you please rebase?

@@ -72,7 +72,7 @@ public String getName()
{
Session session = stateMachine.getSession();
QualifiedObjectName tableName = createQualifiedObjectName(session, statement, statement.getName());
Optional<TableHandle> tableHandle = metadata.getTableHandle(session, tableName);
Optional<TableHandle> tableHandle = metadata.getOriginalTableHandle(session, tableName, Optional.of(getName()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail modifications for redirected tables in engine

I am not convinced we should to that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here "modifications" is bad wording, meant DDLs. we already added support for DMLs in #8683

The last discussion that we had on this was #7606 (comment)

DDL operations are problematic since users need to be aware of which connector they are using, since they have different data types, partitioning, bucketing, etc. Our thought was that these are relatively rare operations by more advanced users and that trying to have hidden redirections would end up causing more confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you also feel otherwise for DDLs?

Copy link
Member

@findepi findepi Nov 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users sooner or later will ask why ALTER ... ADD COLUMN (something varchar) does not work, and I won't be able to explain to them why not. Yes, from engineer perspective, we may have harder time supporting things like column properties (maybe, or maybe not), but i would assume there are not always used, so users who do not intend to use column properties, will not accept this as a rational explanation for the limitation.
TL;DR yes, DDLs like ALTER .. ADD/DROP COLUMN should be routed as well.

cc @losipiuk @alexjo2144 @claudiusli

@@ -2996,6 +3021,37 @@ public boolean delegateMaterializedViewRefreshToConnector(ConnectorSession sessi
return hiveMaterializedViewMetadata.refreshMaterializedView(session, name);
}

@Override
public Optional<CatalogSchemaTableName> redirectTable(ConnectorSession session, SchemaTableName tableName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update this to #9870?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update this to #9870?

I've taken liberty to update the code myself.
I also copied & adapted an awesome test template done by @ssheikin too.
posted #10173 with the changes

Copy link
Contributor

@ssheikin ssheikin Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These credits belong to @MiguelWeezardo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These credits belong to @MiguelWeezardo

thanks @ssheikin for the comment. added @MiguelWeezardo as co-author in the other PR

@raunaqmorarka
Copy link
Member

Seems to be superseded by #10173

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Support catalogs with Hive and Iceberg tables
4 participants