Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5472] improvement(docs): Add example to use cloud storage fileset and polish hadoop-catalog document. #6059

Merged
merged 44 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
c4fb29a
fix
yuqi1129 Jan 2, 2025
4791a64
Merge branch 'main' of github.com:datastrato/graviton into 5472
yuqi1129 Jan 2, 2025
baf42e1
fix
yuqi1129 Jan 3, 2025
d86610b
Merge branch 'main' of github.com:datastrato/graviton into 5472
yuqi1129 Jan 3, 2025
b7eb621
fix
yuqi1129 Jan 3, 2025
1ecc378
update the docs
yuqi1129 Jan 4, 2025
d232e92
polish document again.
yuqi1129 Jan 6, 2025
fbd57ba
Again
yuqi1129 Jan 6, 2025
4fb6e79
fix
yuqi1129 Jan 6, 2025
e481c8d
fix
yuqi1129 Jan 6, 2025
0a97fc7
fix
yuqi1129 Jan 6, 2025
a6fbe7b
fix
yuqi1129 Jan 7, 2025
7b47a9b
fix
yuqi1129 Jan 7, 2025
ab07455
Polish the doc
yuqi1129 Jan 7, 2025
6c1aac3
Optimize the docs
yuqi1129 Jan 7, 2025
44014d9
format code.
yuqi1129 Jan 7, 2025
f4968bd
Merge branch 'main' of github.com:datastrato/graviton into 5472
yuqi1129 Jan 7, 2025
8c61d18
Merge branch 'main' of github.com:datastrato/graviton into 5472
yuqi1129 Jan 8, 2025
8563c91
polish document
yuqi1129 Jan 8, 2025
0b066a5
polish docs
yuqi1129 Jan 8, 2025
4c6f4c8
typo
yuqi1129 Jan 8, 2025
76f651e
Polish document again.
yuqi1129 Jan 9, 2025
51446ce
fix
yuqi1129 Jan 9, 2025
2b9c35f
Fix error.
yuqi1129 Jan 9, 2025
d65b995
Fix error.
yuqi1129 Jan 9, 2025
58e3a90
Fix error.
yuqi1129 Jan 9, 2025
746a3ce
fix
yuqi1129 Jan 9, 2025
4d644f1
Optimize document `how-to-use-gvfs.md`
yuqi1129 Jan 10, 2025
cfb054c
Optimize structure.
yuqi1129 Jan 10, 2025
de96e74
resolve comments
yuqi1129 Jan 13, 2025
7806b2f
resolve comments
yuqi1129 Jan 13, 2025
71586f3
Polish documents
yuqi1129 Jan 13, 2025
7b8ad31
fix
yuqi1129 Jan 13, 2025
c9eca73
fix
yuqi1129 Jan 13, 2025
aacd58f
fix
yuqi1129 Jan 13, 2025
d3a8986
fix
yuqi1129 Jan 14, 2025
54536d9
fix
yuqi1129 Jan 14, 2025
b2e357f
Merge branch 'main' of github.com:datastrato/graviton into 5472
yuqi1129 Jan 14, 2025
1971ba1
Resolve python code indent and fix table format problem.
yuqi1129 Jan 14, 2025
30f4271
Fix incompleted description about endpoint for S3
yuqi1129 Jan 14, 2025
65c171c
Optimize ADLS descriptions
yuqi1129 Jan 14, 2025
1e155d4
Fix the problem in #5737 that does not change azure account-name and …
yuqi1129 Jan 14, 2025
d2d2de3
fix
yuqi1129 Jan 14, 2025
e01b201
fix again
yuqi1129 Jan 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions docs/hadoop-catalog-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,21 @@ keyword: Hadoop catalog index S3 GCS ADLS OSS
license: "This software is licensed under the Apache License version 2."
---

### Hadoop catalog overall

FANNG1 marked this conversation as resolved.
Show resolved Hide resolved
Gravitino Hadoop catalog index includes the following chapters:

- [Hadoop catalog overview and features](./hadoop-catalog.md)
- [Manage Hadoop catalog with Gravitino API](./manage-fileset-metadata-using-gravitino.md)
- [Using Hadoop catalog with Gravitino virtual System](how-to-use-gvfs.md)
- [Hadoop catalog overview and features](./hadoop-catalog.md): This chapter provides an overview of the Hadoop catalog, its features, capabilities and related configurations.
- [Manage Hadoop catalog with Gravitino API](./manage-fileset-metadata-using-gravitino.md): This chapter explains how to manage fileset metadata using Gravitino API and provides detailed examples.
- [Using Hadoop catalog with Gravitino virtual System](how-to-use-gvfs.md): This chapter explains how to use Hadoop catalog with Gravitino virtual System and provides detailed examples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gravitino virtual System -> Gravitino virtual file system


### Hadoop catalog with cloud storage

Apart from the above, you can also refer to the following topics to manage and access cloud storage like S3, GCS, ADLS, and OSS:

- [Using Hadoop catalog to manage S3](./hadoop-catalog-with-s3.md)
- [Using Hadoop catalog to manage GCS](./hadoop-catalog-with-gcs.md)
- [Using Hadoop catalog to manage ADLS](./hadoop-catalog-with-adls.md)
- [Using Hadoop catalog to manage OSS](./hadoop-catalog-with-oss.md)
- [Using Hadoop catalog to manage S3](./hadoop-catalog-with-s3.md).
- [Using Hadoop catalog to manage GCS](./hadoop-catalog-with-gcs.md).
- [Using Hadoop catalog to manage ADLS](./hadoop-catalog-with-adls.md).
- [Using Hadoop catalog to manage OSS](./hadoop-catalog-with-oss.md).

More storage options will be added soon. Stay tuned!
14 changes: 8 additions & 6 deletions docs/hadoop-catalog-with-adls.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,14 @@ Once the server is up and running, you can proceed to configure the Hadoop catal

Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties), the following properties are required to configure a Hadoop catalog with ADLS:

| Configuration item | Description | Default value | Required | Since version |
|-----------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
| `filesystem-providers` | The file system providers to add. Set it to `abs` if it's a Azure Blob Storage fileset, or a comma separated string that contains `abs` like `oss,abs,s3` to support multiple kinds of fileset including `abs`. | (none) | Yes | 0.8.0-incubating |
| `default-filesystem-provider` | The name default filesystem providers of this Hadoop catalog if users do not specify the scheme in the URI. Default value is `builtin-local`, for Azure Blob Storage, if we set this value, we can omit the prefix 'abfss://' in the location. | `builtin-local` | No | 0.8.0-incubating |
| `azure-storage-account-name ` | The account name of Azure Blob Storage. | (none) | Yes | 0.8.0-incubating |
| `azure-storage-account-key` | The account key of Azure Blob Storage. | (none) | Yes | 0.8.0-incubating |
| Configuration item | Description | Default value | Required | Since version |
|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
| `filesystem-providers` | The file system providers to add. Set it to `abs` if it's a Azure Blob Storage fileset, or a comma separated string that contains `abs` like `oss,abs,s3` to support multiple kinds of fileset including `abs`. | (none) | Yes | 0.8.0-incubating |
| `default-filesystem-provider` | The name default filesystem providers of this Hadoop catalog if users do not specify the scheme in the URI. Default value is `builtin-local`, for Azure Blob Storage, if we set this value, we can omit the prefix 'abfss://' in the location. | `builtin-local` | No | 0.8.0-incubating |
| `azure-storage-account-name ` | The account name of Azure Blob Storage. | (none) | Yes | 0.8.0-incubating |
| `azure-storage-account-key` | The account key of Azure Blob Storage. | (none) | Yes | 0.8.0-incubating |
| `credential-providers` | The credential provider types, separated by comma, possible value can be `adls-token`, `azure-account-key`. As the default authentication type is using account name and account key as the above, this configuration can enable credential vending provided by Gravitino server and client will no longer need to provide authentication information like account_name/account_key to access ADLS by GVFS. Once it's set, more configuration items are needed to make it works, please see [adls-credential-vending](security/credential-vending.md) | (none) | No | 0.8.0-incubating |
FANNG1 marked this conversation as resolved.
Show resolved Hide resolved


### Configurations for a schema

Expand Down
12 changes: 7 additions & 5 deletions docs/hadoop-catalog-with-gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,13 @@ Once the server is up and running, you can proceed to configure the Hadoop catal

Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties), the following properties are required to configure a Hadoop catalog with GCS:

| Configuration item | Description | Default value | Required | Since version |
|-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
| `filesystem-providers` | The file system providers to add. Set it to `gcs` if it's a GCS fileset, a comma separated string that contains `gcs` like `gcs,s3` to support multiple kinds of fileset including `gcs`. | (none) | Yes | 0.7.0-incubating |
| `default-filesystem-provider` | The name default filesystem providers of this Hadoop catalog if users do not specify the scheme in the URI. Default value is `builtin-local`, for GCS, if we set this value, we can omit the prefix 'gs://' in the location. | `builtin-local` | No | 0.7.0-incubating |
| `gcs-service-account-file` | The path of GCS service account JSON file. | (none) | Yes | 0.7.0-incubating |
| Configuration item | Description | Default value | Required | Since version |
|-------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
| `filesystem-providers` | The file system providers to add. Set it to `gcs` if it's a GCS fileset, a comma separated string that contains `gcs` like `gcs,s3` to support multiple kinds of fileset including `gcs`. | (none) | Yes | 0.7.0-incubating |
| `default-filesystem-provider` | The name default filesystem providers of this Hadoop catalog if users do not specify the scheme in the URI. Default value is `builtin-local`, for GCS, if we set this value, we can omit the prefix 'gs://' in the location. | `builtin-local` | No | 0.7.0-incubating |
| `gcs-service-account-file` | The path of GCS service account JSON file. | (none) | Yes | 0.7.0-incubating |
| `credential-providers` | The credential provider types, separated by comma, possible value can be `gcs-token`. As the default authentication type is using service account as the above, this configuration can enable credential vending provided by Gravitino server and client will no longer need to provide authentication information like service account to access GCS by GVFS. Once it's set, more configuration items are needed to make it works, please see [gcs-credential-vending](security/credential-vending.md) | (none) | No | 0.8.0-incubating |


### Configurations for a schema

Expand Down
Loading
Loading