Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add S3 support for Fileset Hadoop catalog #3379

Closed
Tracked by #4843
jerryshao opened this issue May 14, 2024 · 3 comments · Fixed by #4232
Closed
Tracked by #4843

[FEATURE] Add S3 support for Fileset Hadoop catalog #3379

jerryshao opened this issue May 14, 2024 · 3 comments · Fixed by #4232
Assignees
Labels
0.7.0 Release v0.7.0 feature New feature or request

Comments

@jerryshao
Copy link
Contributor

Describe the feature

Fileset is a new concept brought in 0.5.0 to manage the non-tabular data, the current implementation uses HCFS to manage the physical data. With HCFS, the Hadoop catalog should support different underlying storage, but currently we only verified local file system and HDFS.

In this issue, we should also support S3, to make the fileset hadoop catalog work with S3 object store.

Motivation

The reason to support S3 is that it is vastly used on the public cloud, we should add this support anyway.

Describe the solution

No response

Additional context

No response

@jerryshao jerryshao added the feature New feature or request label May 14, 2024
@zhoukangcn
Copy link
Contributor

I think we can change this feature to Support Object Store provided by Cloud Service, so we can add subtask to support Azure Blob and Aliyun OSS

@jerryshao
Copy link
Contributor Author

@xiaozcy can you please leave a message here, so I can assign the issue to you.

@jerryshao jerryshao added the 0.6.0 label Aug 1, 2024
@xiaozcy
Copy link
Contributor

xiaozcy commented Aug 1, 2024

@xiaozcy can you please leave a message here, so I can assign the issue to you.

Sure.

@jerryshao jerryshao removed the 0.6.0 label Aug 7, 2024
@jerryshao jerryshao added this to the Gravitino 0.7.0 milestone Aug 7, 2024
@jerryshao jerryshao added the 0.7.0 Release v0.7.0 label Oct 21, 2024
mplmoknijb pushed a commit to mplmoknijb/gravitino that referenced this issue Nov 6, 2024
… catalog (apache#4232)

### What changes were proposed in this pull request?

Add S3 support for Fileset Hadoop catalog. We only add hadoop-aws
dependency actually, most of the work is conducting tests.

### Why are the changes needed?

Fix: apache#3379 

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

IT.

---------

Co-authored-by: zhanghan18 <[email protected]>
Co-authored-by: yuqi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.7.0 Release v0.7.0 feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants