[FEATURE] Add S3 support for Fileset Hadoop catalog #3379

jerryshao · 2024-05-14T03:24:20Z

Describe the feature

Fileset is a new concept brought in 0.5.0 to manage the non-tabular data, the current implementation uses HCFS to manage the physical data. With HCFS, the Hadoop catalog should support different underlying storage, but currently we only verified local file system and HDFS.

In this issue, we should also support S3, to make the fileset hadoop catalog work with S3 object store.

Motivation

The reason to support S3 is that it is vastly used on the public cloud, we should add this support anyway.

Describe the solution

No response

Additional context

No response

zhoukangcn · 2024-05-14T04:27:41Z

I think we can change this feature to Support Object Store provided by Cloud Service, so we can add subtask to support Azure Blob and Aliyun OSS

jerryshao · 2024-08-01T11:14:49Z

@xiaozcy can you please leave a message here, so I can assign the issue to you.

xiaozcy · 2024-08-01T13:43:43Z

@xiaozcy can you please leave a message here, so I can assign the issue to you.

Sure.

… catalog (apache#4232) ### What changes were proposed in this pull request? Add S3 support for Fileset Hadoop catalog. We only add hadoop-aws dependency actually, most of the work is conducting tests. ### Why are the changes needed? Fix: apache#3379 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? IT. --------- Co-authored-by: zhanghan18 <[email protected]> Co-authored-by: yuqi <[email protected]>

jerryshao added the feature New feature or request label May 14, 2024

xiaozcy mentioned this issue Jul 22, 2024

[#3379] feat(catalog-hadoop): Add S3 support for Fileset Hadoop catalog #4232

Merged

jerryshao added the 0.6.0 label Aug 1, 2024

jerryshao assigned xiaozcy Aug 2, 2024

jerryshao removed the 0.6.0 label Aug 7, 2024

jerryshao added this to the Gravitino 0.7.0 milestone Aug 7, 2024

xiaozcy mentioned this issue Sep 3, 2024

[EPIC] Support different storages for fileset #4843

Open

8 tasks

jerryshao closed this as completed in #4232 Oct 21, 2024

jerryshao closed this as completed in f69bdaf Oct 21, 2024

jerryshao added the 0.7.0 Release v0.7.0 label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add S3 support for Fileset Hadoop catalog #3379

[FEATURE] Add S3 support for Fileset Hadoop catalog #3379

jerryshao commented May 14, 2024

zhoukangcn commented May 14, 2024

jerryshao commented Aug 1, 2024

xiaozcy commented Aug 1, 2024

[FEATURE] Add S3 support for Fileset Hadoop catalog #3379

[FEATURE] Add S3 support for Fileset Hadoop catalog #3379

Comments

jerryshao commented May 14, 2024

Describe the feature

Motivation

Describe the solution

Additional context

zhoukangcn commented May 14, 2024

jerryshao commented Aug 1, 2024

xiaozcy commented Aug 1, 2024