Optimize the docs

apache · Jan 7, 2025 · 6c1aac3 · 6c1aac3
1 parent ab07455
commit 6c1aac3
Show file tree

Hide file tree

Showing 4 changed files with 94 additions and 47 deletions.
diff --git a/docs/hadoop-catalog-with-adls.md b/docs/hadoop-catalog-with-adls.md
@@ -10,8 +10,17 @@ This document describes how to configure a Hadoop catalog with ADLS (Azure Blob
 
 ## Prerequisites
 
-In order to create a Hadoop catalog with ADLS, you need to place [`gravitino-azure-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-azure-bundle) in Gravitino Hadoop catalog classpath located
-at `${GRAVITINO_HOME}/catalogs/hadoop/libs//`. After that, start Gravitino server with the following command:
+To set up a Hadoop catalog with ADLS, follow these steps:
+
+1. Download the [`gravitino-azure-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-azure-bundle) file.
+2. Place the downloaded file into the Gravitino Hadoop catalog classpath at `${GRAVITINO_HOME}/catalogs/hadoop/libs/`.
+3. Start the Gravitino server by running the following command:
+
+```bash
+$ bin/gravitino-server.sh start
+```
+Once the server is up and running, you can proceed to configure the Hadoop catalog with ADLS.
+
 
 ```bash
 $ bin/gravitino-server.sh start
@@ -21,7 +30,7 @@ $ bin/gravitino-server.sh start
 
 The rest of this document shows how to use the Hadoop catalog with ADLS in Gravitino with a full example.
 
-### Create a ADLS Hadoop catalog
+###  Configuration for a ADLS Hadoop catalog
 
 Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties), the following properties are required to configure a Hadoop catalog with ADLS:
 
@@ -32,18 +41,20 @@ Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./
 | `azure-storage-account-name `     | The account name of Azure Blob Storage.                                                                                                                                                                                                        | (none)          | Yes if it's a Azure Blob Storage fileset. | 0.8.0-incubating |
 | `azure-storage-account-key`       | The account key of Azure Blob Storage.                                                                                                                                                                                                         | (none)          | Yes if it's a Azure Blob Storage fileset. | 0.8.0-incubating |
 
-### Create a schema
+### Configuration for a schema
 
 Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema-operations) for more details.
 
-### Create a fileset
+### Configuration for a fileset
 
 Refer to [Fileset operation](./manage-fileset-metadata-using-gravitino.md#fileset-operations) for more details.
 
 
 ## Using Hadoop catalog with ADLS
 
-### Create a Hadoop catalog/schema/fileset with ADLS
+This section demonstrates how to use the Hadoop catalog with ADLS in Gravitino, with a complete example.
+
+### Step1: Create a Hadoop catalog with ADLS
 
 First, you need to create a Hadoop catalog with ADLS. The following example shows how to create a Hadoop catalog with ADLS:
 
@@ -113,9 +124,9 @@ adls_properties = gravitino_client.create_catalog(name="example_catalog",
 </TabItem>
 </Tabs>
 
-Then create a schema and fileset in the catalog created above.
+### Step2: Create a schema
 
-Using the following code to create a schema and a fileset:
+Once the catalog is created, you can create a schema. The following example shows how to create a schema:
 
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
@@ -163,6 +174,10 @@ catalog.as_schemas().create_schema(name="test_schema",
 </TabItem>
 </Tabs>
 
+### Step3: Create a fileset
+
+After creating the schema, you can create a fileset. The following example shows how to create a fileset:
+
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
 
@@ -221,6 +236,8 @@ catalog.as_fileset_catalog().create_fileset(ident=NameIdentifier.of("test_schema
 </TabItem>
 </Tabs>
 
+## Accessing a fileset with ADLS
+
 ### Using Spark to access the fileset
 
 The following code snippet shows how to use **PySpark 3.1.3 with Hadoop environment(Hadoop 3.2.0)** to access the fileset:

diff --git a/docs/hadoop-catalog-with-gcs.md b/docs/hadoop-catalog-with-gcs.md
@@ -21,8 +21,7 @@ $ bin/gravitino-server.sh start
 
 The rest of this document shows how to use the Hadoop catalog with GCS in Gravitino with a full example.
 
-
-### Create a GCS Hadoop catalog
+### Configuration for a GCS Hadoop catalog
 
 Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties), the following properties are required to configure a Hadoop catalog with GCS:
 
@@ -32,17 +31,19 @@ Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./
 | `default-filesystem-provider` | The name default filesystem providers of this Hadoop catalog if users do not specify the scheme in the URI. Default value is `builtin-local`, for GCS, if we set this value, we can omit the prefix 'gs://' in the location. | `builtin-local` | No                         | 0.7.0-incubating |
 | `gcs-service-account-file`    | The path of GCS service account JSON file.                                                                                                                                                                                   | (none)          | Yes if it's a GCS fileset. | 0.7.0-incubating |
 
-### Create a schema
+### Configuration for a schema
 
 Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema-operations) for more details.
 
-### Create a fileset
+### Configuration for a fileset
 
 Refer to [Fileset operation](./manage-fileset-metadata-using-gravitino.md#fileset-operations) for more details.
 
 ## Using Hadoop catalog with GCS
 
-### Create a Hadoop catalog/schema/fileset with GCS
+This section will show you how to use the Hadoop catalog with GCS in Gravitino, including detailed examples.
+
+### Create a Hadoop catalog with GCS
 
 First, you need to create a Hadoop catalog with GCS. The following example shows how to create a Hadoop catalog with GCS:
 
@@ -109,9 +110,9 @@ gcs_properties = gravitino_client.create_catalog(name="test_catalog",
 </TabItem>
 </Tabs>
 
-Then create a schema and a fileset in the catalog created above.
+### Step2: Create a schema
 
-Using the following code to create a schema and a fileset:
+Once you have created a Hadoop catalog with GCS, you can create a schema. The following example shows how to create a schema:
 
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
@@ -159,6 +160,11 @@ catalog.as_schemas().create_schema(name="test_schema",
 </TabItem>
 </Tabs>
 
+
+### Step3: Create a fileset
+
+After creating a schema, you can create a fileset. The following example shows how to create a fileset:
+
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
 
@@ -217,6 +223,8 @@ catalog.as_fileset_catalog().create_fileset(ident=NameIdentifier.of("test_schema
 </TabItem>
 </Tabs>
 
+## Accessing a fileset with GCS
+
 ### Using Spark to access the fileset
 
 The following code snippet shows how to use **PySpark 3.1.3 with Hadoop environment(Hadoop 3.2.0)** to access the fileset:

diff --git a/docs/hadoop-catalog-with-oss.md b/docs/hadoop-catalog-with-oss.md
@@ -6,22 +6,26 @@ keyword: Hadoop catalog OSS
 license: "This software is licensed under the Apache License version 2."
 ---
 
-This document describes how to configure a Hadoop catalog with Aliyun OSS.
+This document explains how to configure a Hadoop catalog with Aliyun OSS (Object Storage Service) in Gravitino.
 
 ## Prerequisites
 
-In order to create a Hadoop catalog with OSS, you need to place [`gravitino-aliyun-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-aliyun-bundle) in Gravitino Hadoop catalog classpath located
-at `${GRAVITINO_HOME}/catalogs/hadoop/libs/`. After that, start Gravitino server with the following command:
+To set up a Hadoop catalog with OSS, follow these steps:
+
+1. Download the [`gravitino-aliyun-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-aliyun-bundle) file.
+2. Place the downloaded file into the Gravitino Hadoop catalog classpath at `${GRAVITINO_HOME}/catalogs/hadoop/libs/`.
+3. Start the Gravitino server by running the following command:
 
 ```bash
 $ bin/gravitino-server.sh start
 ```
+Once the server is up and running, you can proceed to configure the Hadoop catalog with OSS.
 
 ## Create a Hadoop Catalog with OSS
 
-### Create an OSS Hadoop catalog
+### Configuration for an OSS Hadoop catalog
 
-Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties), the following properties are required to configure a Hadoop catalog with OSS:
+In addition to the basic configurations mentioned in [Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties), the following properties are required to configure a Hadoop catalog with OSS:
 
 | Configuration item            | Description                                                                                                                                                                                                                   | Default value   | Required                   | Since version    |
 |-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------------------------|------------------|
@@ -31,22 +35,21 @@ Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./
 | `oss-access-key-id`           | The access key of the Aliyun OSS.                                                                                                                                                                                             | (none)          | Yes if it's a OSS fileset. | 0.7.0-incubating |
 | `oss-secret-access-key`       | The secret key of the Aliyun OSS.                                                                                                                                                                                             | (none)          | Yes if it's a OSS fileset. | 0.7.0-incubating |
 
-### Create a schema
-
-Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema-operations) for more details.
+### Configuration for a schema
 
-### Create a fileset
+To create a schema, refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema-operations).
 
-Refer to [Fileset operation](./manage-fileset-metadata-using-gravitino.md#fileset-operations) for more details.
+### Configuration for a fileset
 
+For instructions on how to create a fileset, refer to [Fileset operation](./manage-fileset-metadata-using-gravitino.md#fileset-operations) for more details.
 
 ## Using Hadoop catalog with OSS
 
-The rest of this document shows how to use the Hadoop catalog with OSS in Gravitino with a full example.
+This section will show you how to use the Hadoop catalog with OSS in Gravitino, including detailed examples.
 
-### Create a Hadoop catalog/schema/fileset with OSS
+### Create a Hadoop catalog with OSS
 
-First, you need to create a Hadoop catalog with OSS. The following example shows how to create a Hadoop catalog with OSS:
+First, you need to create a Hadoop catalog for OSS. The following examples demonstrate how to create a Hadoop catalog with OSS:
 
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
@@ -117,9 +120,9 @@ oss_catalog = gravitino_client.create_catalog(name="test_catalog",
 </TabItem>
 </Tabs>
 
-Then create a schema and a fileset in the catalog created above.
+Step 2: Create a Schema
 
-Using the following code to create a schema and a fileset:
+Once the Hadoop catalog with OSS is created, you can create a schema inside that catalog. Below are examples of how to do this:
 
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
@@ -167,6 +170,12 @@ catalog.as_schemas().create_schema(name="test_schema",
 </TabItem>
 </Tabs>
 
+
+### Create a fileset
+
+Now that the schema is created, you can create a fileset inside it. Here’s how:
+
+
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
 
@@ -225,6 +234,8 @@ catalog.as_fileset_catalog().create_fileset(ident=NameIdentifier.of("test_schema
 </TabItem>
 </Tabs>
 
+## Accessing a fileset with OSS
+
 ### Using Spark to access the fileset
 
 The following code snippet shows how to use **PySpark 3.1.3 with Hadoop environment(Hadoop 3.2.0)** to access the fileset:
@@ -432,7 +443,7 @@ Spark:
 
 ```python
 spark = SparkSession.builder
-  .appName("oss_fielset_test")
+  .appName("oss_fileset_test")
   .config("spark.hadoop.fs.AbstractFileSystem.gvfs.impl", "org.apache.gravitino.filesystem.hadoop.Gvfs")
   .config("spark.hadoop.fs.gvfs.impl", "org.apache.gravitino.filesystem.hadoop.GravitinoVirtualFileSystem")
   .config("spark.hadoop.fs.gravitino.server.uri", "${GRAVITINO_SERVER_IP:PORT}")

diff --git a/docs/hadoop-catalog-with-s3.md b/docs/hadoop-catalog-with-s3.md
@@ -6,22 +6,28 @@ keyword: Hadoop catalog S3
 license: "This software is licensed under the Apache License version 2."
 ---
 
-This document describes how to configure a Hadoop catalog with S3. 
+This document explains how to configure a Hadoop catalog with S3 in Gravitino.
 
 ## Prerequisites
 
-In order to create a Hadoop catalog with S3, you need to place [`gravitino-aws-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-aws-bundle) in Gravitino Hadoop catalog classpath located 
-at `${GRAVITINO_HOME}/catalogs/hadoop/libs/`. After that, start Gravitino server with the following command:
+To create a Hadoop catalog with S3, follow these steps:
+
+1. Download the [`gravitino-aws-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-aws-bundle) file.
+2. Place this file in the Gravitino Hadoop catalog classpath at `${GRAVITINO_HOME}/catalogs/hadoop/libs/`.
+3. Start the Gravitino server using the following command:
 
 ```bash
 $ bin/gravitino-server.sh start
 ```
 
+Once the server is running, you can proceed to create the Hadoop catalog with S3.
+
+
 ## Create a Hadoop Catalog with S3
 
-### Create a S3 Hadoop catalog
+### Configuration for S3 Hadoop Catalog
 
-Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties), the following properties are required to configure a Hadoop catalog with S3:
+In addition to the basic configurations mentioned in [Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties), the following properties are necessary to configure a Hadoop catalog with S3:
 
 | Configuration item            | Description                                                                                                                                                                                                                  | Default value   | Required                  | Since version    |
 |-------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|---------------------------|------------------|
@@ -31,20 +37,20 @@ Apart from configurations mentioned in [Hadoop-catalog-catalog-configuration](./
 | `s3-access-key-id`            | The access key of the AWS S3.                                                                                                                                                                                                | (none)          | Yes if it's a S3 fileset. | 0.7.0-incubating |
 | `s3-secret-access-key`        | The secret key of the AWS S3.                                                                                                                                                                                                | (none)          | Yes if it's a S3 fileset. | 0.7.0-incubating |
 
-### Create a schema
+### Configuration for a schema
 
-Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema-operations) for more details.
+To learn how to create a schema, refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema-operations).
 
-### Create a fileset
+### Configuration for a fileset
 
-Refer to [Fileset operation](./manage-fileset-metadata-using-gravitino.md#fileset-operations) for more details.
+For more details on creating a fileset, Refer to [Fileset operation](./manage-fileset-metadata-using-gravitino.md#fileset-operations).
 
 
-## Using Hadoop catalog with S3
+## Using the Hadoop catalog with S3
 
-The rest of this document shows how to use the Hadoop catalog with S3 in Gravitino with a full example.
+This section demonstrates how to use the Hadoop catalog with S3 in Gravitino, with a complete example.
 
-### Create a Hadoop catalog/schema/fileset with S3
+### Step1: Create a Hadoop Catalog with S3
 
 First of all, you need to create a Hadoop catalog with S3. The following example shows how to create a Hadoop catalog with S3:
 
@@ -118,12 +124,12 @@ s3_catalog = gravitino_client.create_catalog(name="test_catalog",
 </Tabs>
 
 :::note
-The value of location should always start with `s3a` NOT `s3` for AWS S3, for instance, `s3a://bucket/root`. Value like `s3://bucket/root` is not supported due to the limitation of the hadoop-aws library.
+When using S3 with Hadoop, ensure that the location value starts with s3a:// (not s3://) for AWS S3. For example, use s3a://bucket/root, as the s3:// format is not supported by the hadoop-aws library.
 :::
 
-Then create a schema and a fileset in the catalog created above. 
+### Step2: Create a schema
 
-Using the following code to create a schema and a fileset:
+Once your Hadoop catalog with S3 is created, you can create a schema under the catalog. Here are examples of how to do that:
 
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
@@ -172,6 +178,10 @@ catalog.as_schemas().create_schema(name="test_schema",
 </TabItem>
 </Tabs>
 
+### Step3: Create a fileset
+
+After creating the schema, you can create a fileset. Here are examples for creating a fileset:
+
 <Tabs groupId="language" queryString>
 <TabItem value="shell" label="Shell">
 
@@ -230,10 +240,11 @@ catalog.as_fileset_catalog().create_fileset(ident=NameIdentifier.of("schema", "e
 </TabItem>
 </Tabs>
 
+## Accessing a fileset with S3
 
 ### Using Spark to access the fileset
 
-The following code snippet shows how to use **PySpark 3.1.3 with Hadoop environment(Hadoop 3.2.0)** to access the fileset:
+The following Python code demonstrates how to use **PySpark 3.1.3 with Hadoop environment(Hadoop 3.2.0)** to access the fileset:
 
 ```python
 import logging