Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#6133] Improvement(core): Supports get Fileset schema location in the AuthorizationUtils #6211

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Abyss-lord
Copy link
Contributor

What changes were proposed in this pull request?

Supports get Fileset schema location in the AuthorizationUtils

A Fileset can be uniquely identified using metalake.catalog.schema.fileset. The logic for retrieving the schema is as follows:

Check the type of catalogObj.type():

  • If it is RELATIONAL, determine whether the provider is Hive. If it is a Hive table, retrieve its LOCATION property.
  • If it is FILESET, determine whether it implements HasPropertyMetadata:
    1. If it does, use the schemaPropertiesMetadata() method to retrieve the path.
    2. If it does not implement HasPropertyMetadata, check whether it contains any Fileset objects:
      1. If it does not, convert the catalog object to FilesetCatalog and retrieve its LOCATION property.
      2. If it does contain Filesets, retrieve all the Fileset instances and add their respective Fileset paths.

Why are the changes needed?

Fix: #6133

Does this PR introduce any user-facing change?

No

How was this patch tested?

local test.

… in the AuthorizationUtils

Supports get Fileset schema location in the AuthorizationUtils.
@Abyss-lord
Copy link
Contributor Author

Hi @xunliu , could you please review this PR when you have time? I’d really appreciate your feedback.

… in the AuthorizationUtils

use StringUtils replace ==null.
Copy link
Member

@xunliu xunliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Abyss-lord
I think we needs to consider to add an integration test, It's include Schema is Fileset.

break;

case FILESET:
if (catalogObj instanceof HasPropertyMetadata) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to judge HasPropertyMetdatadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerqi Is the fileset catalog guaranteed to implement the HasPropertyMetadata interface?

Copy link
Contributor

@jerqi jerqi Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the fileset catalogs have the property location although they implement the HasPropertyMetdata interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current processing logic is

  1. If the provider is Hadoop, use HasPropertyMetadata to get the location of the Schema.
  2. In other cases, consider adding paths to all filesets below the Schema.
    WDYT, @jerqi

Copy link
Contributor

@jerqi jerqi Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2 seems weird for me. Is this consistent with Hive schema implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Supports get Fileset schema location in the AuthorizationUtils
4 participants