Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add redirection awareness to the table tasks #10577

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Jan 12, 2022

Related PR from @phd3 : #8340

@cla-bot cla-bot bot added the cla-signed label Jan 12, 2022
@findinpath findinpath requested review from findepi, phd3 and dain January 12, 2022 15:37
@findinpath findinpath requested review from electrum and alexjo2144 and removed request for dain January 12, 2022 15:58
@phd3
Copy link
Member

phd3 commented Jan 12, 2022

IMO for every operation, we'd want to either (1) use redirected table handle OR (2) fail if a table is getting redirected. this would be safer than silently trying to get source table handle, especially for future usages. @findepi @losipiuk @electrum

If we've consensus on that, we can start with callsites where we know we want to definitely redirect. Then, separately, add a change that disallows redirection for "vanilla" Metadata#getTableHandle (or introduce a Metadata#getOriginalTableHandle).

operation use redirected table handle / disallow if redirected why? ACL checks
comment table use redirected table handle coherent with future describe / show create check on target
comment column use redirected table handle coherent with future describe / show create check on target
Add column use redirected table handle coherent with future select check on target
create table use redirected table handle (checking existence) shouldn't create a table if it resolves to a "redirected" table check on source (only option)
drop column use redirected table handle coherent with future select check on target
drop table use redirected table handle (checking existence) drop table x failure --> describe table x success is confusing check on target
grant privilege disallow if redirected limited connector support, listPrivileges API needs to change check on target
rename table disallow if redirected should source / target / everything in between be renamed? simpler to disallow check on target
revoke disallow if redirected limited connector support, listPrivileges API needs to change check on target
set table authorization use redirected table handle
rename column use redirected table handle coherent with future select check on target
applyTableScanRedirect disallow if redirected avoid confusing use of similar APIs
extract spatial joins either name comes from SP, seems fine either way. may require some refactoring seems missing currently
show queries rewrite showcreate -> redirect, show grants -> disallow coherent with future select

let's discuss on above operations and see if disallowing/redirecting decisions are reasonable. We can track it in a separate issue and take them in smaller PRs.

@findinpath
Copy link
Contributor Author

@phd3

rename table why do you think that the renaming of redirected tables should be not allowed? If the user is in the context:

USE hive.default;
ALTER TABLE iceberg_table RENAME TO iceberg_table_renamed;

I think we should allow such a behaviour. It adds more simplicity for the user experience and (at the moment) I see no added complexity in allowing such an operation.

@phd3
Copy link
Member

phd3 commented Jan 13, 2022

I feel this is a bit convoluted because of the api design. For a generic operation like rename c1.x.y -> c1.x.z, if we think from the user perspective, they should be able to just use c1.x.z in place of c1.x.y after this operation. say the redirection is c1.x.y -> c2.p.q. This means that c1.x.z must get redirected to the same logical table c2.p.q now. but engine doesn't have control over that logic. In case of iceberg, we're assuming that hive.x.z will just get redirected to iceberg.x.z and their name is "unified" in the metastore, but that's connector knowledge. If we try renaming the source table only (c1.x.y -> c1.x.z), the redirection may completely break. If we also rename the target table c2.p.q, what do we rename it to? May be this needs another api in itself? We may need to consider that for every step at redirection. So I'm leaning towards rejecting renames in case of redirection.

@findepi
Copy link
Member

findepi commented Jan 14, 2022

I agree with @findinpath that table rename base should be allowed
@phd3 redirects are not some abstract, name-based mappings defined on a side.
Primarily, they are a way to provide a user-convenience when querying single metadata storage (like HMS or Glue), which consists tables that no single connector can handle, e.g. mixture of Hive, Iceberg or Delta Lake tables.
It's not explainable to the user that rename should fail in case of redirect, because the outcome (redirect+rename) is natural and an OK choice.

If we try renaming the source table only [..]
If we also rename the target table [...]

The source and target tables are same thing, viewed from different angles, so there is one name only.

cc @losipiuk

@findepi
Copy link
Member

findepi commented Jan 14, 2022

Thanks @phd3 for putting together this table. Awesome job. It really helps to see all the operations listed cohesively in single place.

create table use redirected table handle (checking existence)

CREATE TABLE doesn't need to be redirect-aware.

There are two options

  • (currently existing) the redirects are used for same catalog; the CREATE would fail anyway; redirects don't change anything
  • (potential future use) redirects used to implement fallback like: hot data, cold data. In such case allowing CREATE would be desirable, and would shadow the redirection.
rename table disallow if redirected

discussed above

set table authorization use redirected table handle

Other authorization-related operations are disallowed in case of redirections.
I am not sure about this one. cc @kokosing

applyTableScanRedirect disallow if redirected avoid confusing use of similar APIs

It must be allowed. The APIs can indeed be confusing, but are orthogonal, happen on different layers.
(Actually, disallowing it would be hard to do anyway)

extract spatial joins either

Must follow redirects, as any other read.

@phd3
Copy link
Member

phd3 commented Jan 19, 2022

@phd3 redirects are not some abstract, name-based mappings defined on a side. Primarily, they are a way to provide a user-convenience when querying single metadata storage (like HMS or Glue), which consists tables that no single connector can handle, e.g. mixture of Hive, Iceberg or Delta Lake tables.

I see your point, w.r.t. the current usecase of hive/iceberg/delta.

However, as we know this API lets a connector return any table, not just catalog. (possibly a different table in the same catalog.)

    default Optional<CatalogSchemaTableName> redirectTable(ConnectorSession session, SchemaTableName tableName)
    {
        return Optional.empty();
    }

imo the point around not some abstract, name-based mappings defined on a side makes assumptions that are (1) reasonable w.r.t. current usage but (2) stronger compared to the API signature. so I'm not convinced if we should allow rename redirects unless we tighten the API.

@phd3
Copy link
Member

phd3 commented Jan 19, 2022

agreed with your suggestions on other operations.

I'm curious about this usecase. can you please elaborate or give an example?

redirects used to implement fallback like: hot data, cold data.

@findepi
Copy link
Member

findepi commented Jan 19, 2022

I'm curious about this usecase. can you please elaborate or give an example?

redirects used to implement fallback like: hot data, cold data.

my imaginary use-case

imo the point around not some abstract, name-based mappings defined on a side makes assumptions that are (1) reasonable w.r.t. current usage but (2) stronger compared to the API signature. so I'm not convinced if we should allow rename redirects unless we tighten the API.

not disallowing renames is not incorrect in broader case, in potential future use-cases, and are diserable in use-cases we have today

i agree that if a redirect source_table so some_oddly_named table a rename can be confusing,
but such a redirect would be confusing anyway -- redirects are visible to the user, the user may get errors referencing "some_oddly_named", etc.
if we want to have user-invisible, transparent redirects, "table scan redirection" (io.trino.spi.connector.ConnectorMetadata#applyTableScanRedirect) should be used in such use-cases instead.

@electrum
Copy link
Member

redirects are not some abstract, name-based mappings defined on a side. Primarily, they are a way to provide a user-convenience when querying single metadata storage (like HMS or Glue), which consists tables that no single connector can handle

If that's the case, then we should change the APIs to only support that use case (same schema/table name but different catalog/connector implementation). In hindsight, that's probably what we should have done in the first place.

@findinpath
Copy link
Contributor Author

I am reviving this thread which seems to be dormant for a while.

So far I see that we don't have an agreement on RENAME TABLE functionality.
Let's revisit this topic again and get to a decision.

I will create a smaller PR containing the changes from this PR for:

  • add column
  • comment
  • drop table

to get this effort going forward.

@findinpath
Copy link
Contributor Author

Closing the PR because its content has been already applied in #11072 and #11277

Discussions around the functionality are continued in #11202

@findinpath findinpath closed this Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants