From d7bdee62a6aed81d2593e6049c4bfb25e4c0116e Mon Sep 17 00:00:00 2001 From: Justin Mclean Date: Thu, 4 Jan 2024 13:22:43 +1100 Subject: [PATCH] Fix minor grammar errors and English mistakes in documentation. (#1302) ### What changes were proposed in this pull request? Fix minor grammar errors and English mistakes in documentation. ### Why are the changes needed? For clarity. Fix: # N/A ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Built locally. --- README.md | 9 ++-- docs/apache-hive-catalog.md | 16 ++++---- docs/docker-image-details.md | 10 ++--- docs/getting-started.md | 13 +++--- docs/gravitino-server-config.md | 37 +++++++++-------- docs/how-to-build.md | 11 ++--- docs/how-to-install.md | 14 +++---- docs/how-to-sign-releases.md | 2 +- docs/how-to-test.md | 14 +++---- docs/how-to-use-the-playground.md | 4 +- docs/iceberg-rest-service.md | 6 +-- docs/index.md | 14 +++---- docs/jdbc-mysql-catalog.md | 8 ++-- docs/jdbc-postgresql-catalog.md | 6 +-- docs/lakehouse-iceberg-catalog.md | 8 ++-- docs/manage-metadata-using-gravitino.md | 30 +++++++------- docs/metrics.md | 4 +- docs/overview.md | 12 +++--- docs/publish-docker-images.md | 2 +- docs/security.md | 41 ++++++++++++++++--- ...table-partitioning-bucketing-sort-order.md | 16 ++++---- docs/webui.md | 16 ++++---- 22 files changed, 159 insertions(+), 134 deletions(-) diff --git a/README.md b/README.md index ef6030e2722..da3593ee65b 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ Gravitino aims to provide several key features: ## Contributing to Gravitino -Gravitino is open source software available under the Apache 2.0 license. For information of how to contribute to Gravitino please see the [Contribution guidelines](CONTRIBUTING.md). +Gravitino is open source software available under the Apache 2.0 license. For information on how to contribute to Gravitino please see the [Contribution guidelines](CONTRIBUTING.md). ## Online documentation @@ -53,7 +53,7 @@ Or: to build a compressed distribution package. -The generated binary distribution package locates in `distribution` directory. +The directory `distribution` contains the generated binary distribution package. For the details of building and testing Gravitino, please see [How to build Gravitino](docs/how-to-build.md). @@ -61,11 +61,10 @@ For the details of building and testing Gravitino, please see [How to build Grav ### Configure and start the Gravitino server -If you already have a binary distribution package, please decompress the package (if required) -and go to the directory where the package locates. +If you already have a binary distribution package, go to the directory of the decompressed package. Before starting the Gravitino server, please configure the Gravitino server configuration file. The -configuration file, `gravitino.conf`, located in the `conf` directory and follows the standard property file format. You can modify the configuration within this file. +configuration file, `gravitino.conf`, is in the `conf` directory and follows the standard property file format. You can modify the configuration within this file. To start the Gravitino server, please run: diff --git a/docs/apache-hive-catalog.md b/docs/apache-hive-catalog.md index 399f5df4ebd..f7371e357d0 100644 --- a/docs/apache-hive-catalog.md +++ b/docs/apache-hive-catalog.md @@ -24,7 +24,7 @@ The Hive catalog is available for Apache Hive **2.x** only. Support for Apache H ### Catalog capabilities -The Hive catalog supports to create, update, and delete databases and tables in the HMS. +The Hive catalog supports creating, updating, and deleting databases and tables in the HMS. ### Catalog properties @@ -61,12 +61,12 @@ see [Manage Metadata Using Gravitino](./manage-metadata-using-gravitino.md#schem ### Table capabilities -The Hive catalog supports to create, update, and delete tables in the HMS. +The Hive catalog supports creating, updating, and deleting of tables in the HMS. #### Table partitions -The Hive catalog supports [partitioned tables](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables). Users can create partitioned tables in the Hive catalog with specific partitioning attribute. -Although Gravitino supports several partitioning strategies, the Apache Hive inherently only supports a single partitioning strategy (partitioned by column), therefore the Hive catalog only support `Identity` partitioning. +The Hive catalog supports [partitioned tables](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PartitionedTables). Users can create partitioned tables in the Hive catalog with the specific partitioning attribute. +Although Gravitino supports several partitioning strategies, the Apache Hive inherently only supports a single partitioning strategy (partitioned by column), therefore the Hive catalog only supports `Identity` partitioning. :::caution The `fieldName` specified in the partitioning attribute must be a column defined in the table. @@ -75,7 +75,7 @@ The `fieldName` specified in the partitioning attribute must be a column defined #### Table sort orders and distributions The Hive catalog supports [bucketed sorted tables](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-BucketedSortedTables). Users can create bucketed sorted tables in the Hive catalog with specific `distribution` and `sortOrders` attributes. -Although Gravitino supports several distribution strategies, the Apache Hive inherently only supports a single distribution strategy (clustered by column), therefore the Hive catalog only support `Hash` distribution. +Although Gravitino supports several distribution strategies, the Apache Hive inherently only supports a single distribution strategy (clustered by column), therefore the Hive catalog only supports `Hash` distribution. :::caution The `fieldName` specified in the `distribution` and `sortOrders` attribute must be a column defined in the table. @@ -131,7 +131,7 @@ Hive automatically adds and manages some reserved properties. Users aren't allow | `comment` | Used to store the table comment. | 0.2.0 | | `numFiles` | Used to store the number of files in the table. | 0.2.0 | | `totalSize` | Used to store the total size of the table. | 0.2.0 | -| `EXTERNAL` | Indicates whether the table is an external table. | 0.2.0 | +| `EXTERNAL` | Indicates whether the table is external. | 0.2.0 | | `transient_lastDdlTime` | Used to store the last DDL time of the table. | 0.2.0 | ### Table operations @@ -141,7 +141,7 @@ Please refer to [Manage Metadata Using Gravitino](./manage-metadata-using-gravit #### Alter operations Gravitino has already defined a unified set of [metadata operation interfaces](./manage-metadata-using-gravitino.md#alter-a-table), and almost all [Hive Alter operations](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Partition/Column) have corresponding table update request which enable you to change the struct of an existing table. -The following table lists the mapping relationship between Hive Alter operations and Gravitino table update request. +The following table lists the mapping relationship between Hive Alter operations and the Gravitino table update requests. ##### Alter table @@ -157,7 +157,7 @@ The following table lists the mapping relationship between Hive Alter operations | `Alter Table Constraints` | Unsupported | - | :::note -As Gravitino has a separate interface for updating the comment of a table, the Hive catalog sets `comment` as a reserved property for the table, preventing users from setting the comment property, Although Apache Hive change the comment of a table by modifying the comment property of the table. +As Gravitino has a separate interface for updating the comment of a table, the Hive catalog sets `comment` as a reserved property for the table, preventing users from setting the comment property, Although Apache Hive changes the comment of a table by modifying the comment property of the table. ::: ##### Alter column diff --git a/docs/docker-image-details.md b/docs/docker-image-details.md index 7b49212bc55..0661e06140e 100644 --- a/docs/docker-image-details.md +++ b/docs/docker-image-details.md @@ -8,11 +8,11 @@ This software is licensed under the Apache License version 2." # User Docker images -There are 2 kinds of docker images for user Docker images: the Gravitino Docker image and playground Docker images. +There are 2 kinds of Docker images for users to use: the Gravitino Docker image and playground Docker images. ## Gravitino Docker image -You can deploy the service with Gravitino Docker image. +You can deploy the service with the Gravitino Docker image. Container startup commands @@ -36,7 +36,7 @@ You can use the [playground](https://github.com/datastrato/gravitino-playground) The playground consists of multiple Docker images. -The Docker images of playground have suitable configurations for users to experience. +The Docker images of the playground have suitable configurations for users to experience. ### Hive image @@ -59,12 +59,12 @@ Changelog # Developer Docker images -You can use these kinds of the Docker images to facilitate Gravitino integration testing. +You can use these kinds of Docker images to facilitate Gravitino integration testing. You can use it to test all catalog and connector modules within Gravitino. ## Gravitino CI Apache Hive image -You can use this kind of images to test the catalog of Apache Hive. +You can use this kind of image to test the catalog of Apache Hive. Changelog diff --git a/docs/getting-started.md b/docs/getting-started.md index 37c0d55d93e..a4c7807a1b2 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -21,7 +21,6 @@ or locally see [Installing Gravitino playground locally](#installing-gravitino-p If you are using AWS and want to access the instance remotely, be sure to read [Accessing Gravitino on AWS externally](#accessing-gravitino-on-aws-externally) - ## Getting started on Amazon Web Services To begin using Gravitino on AWS, follow these steps: @@ -156,7 +155,7 @@ You can install Apache Hive and Hadoop on AWS or Google Cloud Platform manually, the steps of how to install [Apache Hive](https://cwiki.apache.org/confluence/display/Hive/) and [Hadoop](https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html) instructions on their websites. -Installing and configuring Hive can be a little complex. If you don't already have Hive setup and running you can use the Docker container Datastrato provide to get Gravitino up and running. +Installing and configuring Hive can be a little complex. If you don't already have Hive setup and running you can use the Docker container Datastrato provides to get Gravitino up and running. You can follow the instructions for setting up [Docker on Ubuntu](https://docs.docker.com/engine/install/ubuntu/). @@ -172,7 +171,7 @@ sudo docker start gravitino-container ## Installing Apache Hive locally -The same steps apply for installing Hive locally as on AWS or Google Cloud Platform. You can +The same steps apply to installing Hive locally as on AWS or Google Cloud Platform. You can follow the instructions for [Installing Apache Hive on AWS or Google Cloud Platform](#installing-apache-hive-on-aws-or-google-cloud-platform). ## Installing Gravitino playground on AWS or Google Cloud Platform @@ -181,7 +180,7 @@ Gravitino provides a bundle of Docker images to launch a Gravitino playground, w includes Apache Hive, Apache Hadoop, Trino, MySQL, PostgreSQL, and Gravitino. You can use Docker compose to start them all. -Installing Docker and Docker Compose is a requirement to using the playground. +Installing Docker and Docker Compose is a requirement for using the playground. ```shell sudo apt install docker docker-compose @@ -195,12 +194,12 @@ how to run the playground, please see [how-to-use-the-playground](./how-to-use-t ## Installing Gravitino playground locally -The same steps apply for installing the playground locally as on AWS or Google Cloud Platform. You +The same steps apply to installing the playground locally as on AWS or Google Cloud Platform. You can follow the instructions for [Installing Gravitino playground on AWS or Google Cloud Platform](#installing-gravitino-playground-on-aws-or-google-cloud-platform). ## Using REST to interact with Gravitino -After starting the Gravitino distribution, issue REST commands to create and modify metadata. While you are using localhost in these examples, run these commands remotely via a host name or IP address once you establish correct access. +After starting the Gravitino distribution, issue REST commands to create and modify metadata. While you are using localhost in these examples, run these commands remotely via a hostname or IP address once you establish correct access. 1. Create a Metalake @@ -257,7 +256,7 @@ After starting the Gravitino distribution, issue REST commands to create and mod http://localhost:8090/api/metalakes/metalake/catalogs ``` - Note that the metastore.uris used for the Hive catalog and would need updating if you change your configuration. + Note that the metastore.uris property used for the Hive catalog and would need updating if you change your configuration. ## Accessing Gravitino on AWS externally diff --git a/docs/gravitino-server-config.md b/docs/gravitino-server-config.md index 8ee97a166cf..463c8b7e8c1 100644 --- a/docs/gravitino-server-config.md +++ b/docs/gravitino-server-config.md @@ -10,6 +10,7 @@ This software is licensed under the Apache License version 2." ## Introduction Gravitino supports several configurations: + 1. **Gravitino server configuration**: Used to start up Gravitino server. 2. **Gravitino catalog properties configuration**: Used to make default values for different catalogs. 3. **Some other configurations**: Includes configurations such as HDFS configuration. @@ -23,18 +24,22 @@ The `gravitino.conf` file lists the configuration items in the following table. ### Gravitino HTTP Server configuration -| Configuration item | Description | Default value | Required | Since version | -|------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|----------|---------------| -| `gravitino.server.webserver.host` | The host of Gravitino server. | `0.0.0.0` | No | 0.1.0 | -| `gravitino.server.webserver.httpPort` | The port on which the Gravitino server listens for incoming connections. | `8090` | No | 0.1.0 | -| `gravitino.server.webserver.minThreads` | The minimum number of threads in the thread pool used by Jetty webserver. `minThreads` is 8 if the value is less than 8. | `Math.max(Math.min(Runtime.getRuntime().availableProcessors() * 2, 100), 8)` | No | 0.2.0 | -| `gravitino.server.webserver.maxThreads` | The maximum number of threads in the thread pool used by Jetty webserver. `maxThreads` is 8 if the value is less than 8, and `maxThreads` must be great or equal to `minThreads`. | `Math.max(Runtime.getRuntime().availableProcessors() * 4, 400)` | No | 0.1.0 | -| `gravitino.server.webserver.threadPoolWorkQueueSize` | The size of the queue in the thread pool used by Jetty webserver. | `100` | No | 0.1.0 | -| `gravitino.server.webserver.stopTimeout` | Time in milliseconds to gracefully shutdown the Jetty webserver, for more, please see `org.eclipse.jetty.server.Server#setStopTimeout`. | `30000` | No | 0.2.0 | -| `gravitino.server.webserver.idleTimeout` | The timeout in milliseconds of idle connections. | `30000` | No | 0.2.0 | -| `gravitino.server.webserver.requestHeaderSize` | Maximum size of HTTP requests. | `131072` | No | 0.1.0 | -| `gravitino.server.webserver.responseHeaderSize` | Maximum size of HTTP responses. | `131072` | No | 0.1.0 | -| `gravitino.server.shutdown.timeout` | Time in milliseconds to gracefully shutdown of the Gravitino webserver. | `3000` | No | 0.2.0 | +| Configuration item | Description | Default value | Required | Since version | +|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------|----------|---------------| +| `gravitino.server.webserver.host` | The host of Gravitino server. | `0.0.0.0` | No | 0.1.0 | +| `gravitino.server.webserver.httpPort` | The port on which the Gravitino server listens for incoming connections. | `8090` | No | 0.1.0 | +| `gravitino.server.webserver.minThreads` | The minimum number of threads in the thread pool used by Jetty webserver. `minThreads` is 8 if the value is less than 8. | `Math.max(Math.min(Runtime.getRuntime().availableProcessors() * 2, 100), 8)` | No | 0.2.0 | +| `gravitino.server.webserver.maxThreads` | The maximum number of threads in the thread pool used by Jetty webserver. `maxThreads` is 8 if the value is less than 8, and `maxThreads` must be great or equal to `minThreads`. | `Math.max(Runtime.getRuntime().availableProcessors() * 4, 400)` | No | 0.1.0 | +| `gravitino.server.webserver.threadPoolWorkQueueSize` | The size of the queue in the thread pool used by Jetty webserver. | `100` | No | 0.1.0 | +| `gravitino.server.webserver.stopTimeout` | Time in milliseconds to gracefully shutdown the Jetty webserver, for more, please see `org.eclipse.jetty.server.Server#setStopTimeout`. | `30000` | No | 0.2.0 | +| `gravitino.server.webserver.idleTimeout` | The timeout in milliseconds of idle connections. | `30000` | No | 0.2.0 | +| `gravitino.server.webserver.requestHeaderSize` | Maximum size of HTTP requests. | `131072` | No | 0.1.0 | +| `gravitino.server.webserver.responseHeaderSize` | Maximum size of HTTP responses. | `131072` | No | 0.1.0 | +| `gravitino.server.shutdown.timeout` | Time in milliseconds to gracefully shutdown of the Gravitino webserver. | `3000` | No | 0.2.0 | +| `gravitino.server.webserver.customFilters` | Comma separated list of filter class names to apply to the API. | (none) | No | 0.4.0 | + +The filter in the customFilters should be a standard javax servlet Filter. +Filter parameters can also be specified in the configuration, by setting configuration entries of the form `gravitino.server.webserver..param.=` ### Storage configuration @@ -48,7 +53,7 @@ The `gravitino.conf` file lists the configuration items in the following table. | `gravitino.entity.store.kv.deleteAfterTimeMs` | The maximum time in milliseconds that the deleted data and old version data is kept. Set to at least 10 minutes and no longer than 30 days. | `604800000`(7 days) | No | 0.3.0 | :::caution -It's highly recommend that you change the default value of `gravitino.entity.store.kv.rocksdbPath`, as it's under the deployment directory and future version upgrades may remove it. +It's highly recommended that you change the default value of `gravitino.entity.store.kv.rocksdbPath`, as it's under the deployment directory and future version upgrades may remove it. ::: ### Catalog configuration @@ -81,11 +86,11 @@ There are three types of catalog properties: Catalog properties are either defined in catalog configuration files as default values or specified explicitly when creating a catalog. :::info -Explicit specifications take precedence over the formal configurations. +Explicit specifications take precedence over formal configurations. ::: :::caution -These rules only apply on the catalog properties, doesn't affect on the schema or table properties. +These rules only apply to the catalog properties and don't affect the schema or table properties. ::: | catalog provider | catalog properties | catalog properties configuration file path | @@ -96,7 +101,7 @@ These rules only apply on the catalog properties, doesn't affect on the schema o | `jdbc-postgresql` | [PostgreSQL catalog properties](jdbc-postgresql-catalog.md#catalog-properties) | `catalogs/jdbc-postgresql/conf/jdbc-postgresql.conf` | :::info -Gravitino server automatically add catalog properties configuration dir to classpath. +Gravitino server automatically adds catalog properties configuration dir to classpath. ::: ## Some other configurations diff --git a/docs/how-to-build.md b/docs/how-to-build.md index aa742cc9b45..7b3e8cbdbf7 100644 --- a/docs/how-to-build.md +++ b/docs/how-to-build.md @@ -13,23 +13,20 @@ This software is licensed under the Apache License version 2." + Optionally Docker to run integration tests :::info Please read the following notes first + + Gravitino requires at least JDK8 and at most JDK17 to run Gradle, so you need to install JDK8 to 17 version to launch the build environment. - -+ Gravitino itself uses JDK8 to build, Gravitino Trino connector uses JDK17 to build. You don't - have to preinstall JDK8 or JDK17, Gradle detects the JDK version needed and downloads it automatically. - ++ Gravitino itself supports using JDK8, 11, and 17 to build, Gravitino Trino connector uses + JDK17 to build. You don't have to preinstall the specified JDK environment, ++ Gradle detects the JDK version needed and downloads it automatically. + Gravitino uses Gradle Java Toolchain to detect and manage JDK versions, it checks the installed JDK by running `./gradlew javaToolchains` command. For the details of Gradle Java Toolchain, please see [Gradle Java Toolchain](https://docs.gradle.org/current/userguide/toolchains.html#sec:java_toolchain). - + Make sure you have installed Docker in your environment as Gravitino uses it to run integration tests; without it, some Docker-related tests may not run. - + macOS uses "docker-connector" to make the Gravitino Trino connector work with Docker for macOS. For the details of "docker-connector", please see [docker-connector](https://github.com/wenjunxiao/mac-docker-connector) , `$GRAVITINO_HOME/dev/docker/tools/mac-docker-connector.sh`, and `$GRAVITINO_HOME/dev/docker/tools/README.md` for more details. - + Alternatively, you can use OrbStack to replace Docker for macOS, please see [OrbStack](https://orbstack.dev/), with OrbStack you can run Gravitino integration tests without needing to install "docker-connector". diff --git a/docs/how-to-install.md b/docs/how-to-install.md index d15bab989df..1e0041322e3 100644 --- a/docs/how-to-install.md +++ b/docs/how-to-install.md @@ -22,7 +22,7 @@ If you build Gravitino yourself by `./gradlew compileDistribution` command, you Gravitino binary distribution package in `distribution/package` directory. If you build Gravitino yourself by `./gradlew assembleDistribution` command, you can get the -compressed Gravitino binary distribution package with name `gravitino--bin.tar.gz` in +compressed Gravitino binary distribution package with the name `gravitino--bin.tar.gz` in `distribution` directory with sha256 checksum file `gravitino--bin.tar.gz.sha256`. The Gravitino binary distribution package contains the following files: @@ -49,7 +49,7 @@ The Gravitino binary distribution package contains the following files: The Gravitino server configuration file is `conf/gravitino.conf`. You can configure the Gravitino server by modifying this file. Basic configurations are already added to this file, all the -configurations list in [Gravitino Server Configurations](./gravitino-server-config.md). +configurations are listed in [Gravitino Server Configurations](./gravitino-server-config.md). #### Configure Gravitino server log @@ -69,7 +69,7 @@ modifying the related configuration file in `catalogs//conf` d configurations you set here apply to all the catalogs of the same type you create. For example, if you want to configure the Hive catalog, you can modify the file -`catalogs/hive/conf/hive.conf`. The detailed configurations list in the specific catalog +`catalogs/hive/conf/hive.conf`. The detailed configurations are listed in the specific catalog documentation. :::note @@ -84,11 +84,11 @@ file. Gravitino supports pass in catalog specific configurations by adding `gravitino.bypass.`. For example, if you want to pass in HMS specific configuration -`hive.metastore.client.capability.check` to the underlying Hive client in Hive catalog, you can +`hive.metastore.client.capability.check` to the underlying Hive client in the Hive catalog, you can simply add `gravitino.bypass.` prefix to it. Also, Gravitino supports loading catalog specific configurations from external files. For example, -you can put your own `hive-site.xml` file in `catalogs/hive/conf` directory, Gravitino loads +you can put your own `hive-site.xml` file in `catalogs/hive/conf` directory, and Gravitino loads it automatically. #### Start Gravitino server @@ -99,7 +99,7 @@ After configuring the Gravitino server, you can start the Gravitino server by ru ./bin/gravitino.sh start ``` -You can access the Gravitino Web UI by typing `http://localhost:8090` in your browser. or you +You can access the Gravitino Web UI by typing in your browser. or you can run ```shell @@ -136,7 +136,7 @@ to make sure Gravitino is running. ## Install Gravitino using Docker compose -The published Gravitino Docker image only contains Gravitino server with basic configurations. If +The published Gravitino Docker image only contains the Gravitino server with basic configurations. If you want to experience the whole Gravitino system with other components, you can use the Docker compose file. diff --git a/docs/how-to-sign-releases.md b/docs/how-to-sign-releases.md index 30ede9464b2..81bc8fbdb2f 100644 --- a/docs/how-to-sign-releases.md +++ b/docs/how-to-sign-releases.md @@ -1,5 +1,5 @@ --- -title: How to sign and verify a Gravitino releases +title: How to sign and verify Gravitino releases slug: /how-to-sign-releases license: "Copyright 2023 Datastrato Pvt Ltd. This software is licensed under the Apache License version 2." diff --git a/docs/how-to-test.md b/docs/how-to-test.md index 8d08c554539..7366661621b 100644 --- a/docs/how-to-test.md +++ b/docs/how-to-test.md @@ -51,8 +51,7 @@ Gravitino has two modes to run the integration tests, the default `embedded` mod integration tests. :::note -Running the `./gradlew build` command triggers the build and run the integration tests in embedded -mode. +Running the `./gradlew build` command triggers the build and runs the integration tests in embedded mode. ::: ### Deploy the Gravitino server and run the integration tests in deploy mode @@ -118,14 +117,11 @@ only parts of the integration tests without `gravitino-docker-it` tag run. ## How to debug Gravitino server and integration tests in embedded mode -By default, the integration tests runs in the embedded mode, `MiniGravitino` starts in the -same process. Debugging `MiniGravitino` is simple and easy. You can modify any code in the -Gravitino project and set breakpoints anywhere. +By default, the integration tests run in the embedded mode, `MiniGravitino` starts in the same process. Debugging `MiniGravitino` is simple and easy. You can modify any code in the Gravitino project and set breakpoints anywhere. ## How to debug Gravitino server and integration tests in deploy mode -This mode is closer to the actual environment but more complex to debug. To debug the Gravitino -server code, follow these steps: +This mode is closer to the actual environment but more complex to debug. To debug the Gravitino server code, follow these steps: * Run the `./gradlew build -x test` command to build the Gravitino project. * Use the `./gradlew compileDistribution` command to republish the packaged project in the `distribution` directory. @@ -145,13 +141,13 @@ server code, follow these steps: * View the test results in the `Actions` tab of the pull request page. * Run the integration tests in several steps: * The Gravitino integration tests pull the CI Docker image from the Docker Hub repository. This step typically takes around 15 seconds. - * If you set the `debug action` label in the pull request, GitHub actions runs an SSH server with `csexton/debugger-action@master`, allowing you to log in to the actions environment for remote debugging. + * If you set the `debug action` label in the pull request, GitHub actions runs an SSH server with `csexton/debugger-action@master`, allowing you to log into the GitHub actions environment for remote debugging. * The Gravitino project compiles and packages in the `distribution` directory using the `./gradlew compileDistribution` command. * Run the `./gradlew test -PtestMode=[embedded|deploy]` command. ## Test failure -If a test fails, you can retrieve valuable information from the logs and test report. Test reports are in the `./build/reports` directory. The integration test logs are in the `./integrate-test/build` directory. In deploy mode, Gravitino server logs are in the `./distribution/package/logs/` directory. In the event of a test failure within the GitHub workflow, the system generates archived logs and test reports. To obtain the archive, follow these steps: +If a test fails, you can retrieve valuable information from the logs and test reports. Test reports are in the `./build/reports` directory. The integration test logs are in the `./integrate-test/build` directory. In deploy mode, Gravitino server logs are in the `./distribution/package/logs/` directory. In the event of a test failure within the GitHub workflow, the system generates archived logs and test reports. To obtain the archive, follow these steps: 1. Click the `detail` link associated with the failed integration test in the pull request. This redirects you to the job page. diff --git a/docs/how-to-use-the-playground.md b/docs/how-to-use-the-playground.md index 6209bd708f3..397967e2f3a 100644 --- a/docs/how-to-use-the-playground.md +++ b/docs/how-to-use-the-playground.md @@ -10,7 +10,7 @@ This software is licensed under the Apache License version 2." The playground is a complete Gravitino Docker runtime environment with `Hive`, `HDFS`, `Trino`, `MySQL`, `PostgreSQL`, and a `Gravitino` server. -Depending on your network and computer, startup time may take 3-5 minutes. Once the playground environment has started, you can open http://localhost:8090 in a browser to access the Gravitino Web UI. +Depending on your network and computer, startup time may take 3-5 minutes. Once the playground environment has started, you can open in a browser to access the Gravitino Web UI. ## Prerequisites @@ -89,7 +89,7 @@ ORDER BY total_sales DESC LIMIT 1; ``` -If you want to know top customers who bought the most by state, you can run this SQL. +If you want to know the top customers who bought the most by state, you can run this SQL. ```SQL SELECT customer_name, location, SUM(total_amount) AS total_spent diff --git a/docs/iceberg-rest-service.md b/docs/iceberg-rest-service.md index 2050870dfd0..c8699185b78 100644 --- a/docs/iceberg-rest-service.md +++ b/docs/iceberg-rest-service.md @@ -13,14 +13,14 @@ The Gravitino Iceberg REST Server follows the [Apache Iceberg REST API specifica ### Capabilities -- Supports the Apache Iceberg REST API defined in Iceberg 1.3.1, supports all namespace and table interfaces. `Token`, `ReportMetrics`, and `Config` interfaces aren't supported yet. +- Supports the Apache Iceberg REST API defined in Iceberg 1.3.1, and supports all namespace and table interfaces. `Token`, `ReportMetrics`, and `Config` interfaces aren't supported yet. - Works as a catalog proxy, supporting `HiveCatalog` and `JDBCCatalog`. - When writing to HDFS, the Gravitino Iceberg REST catalog service can only operate as the specified HDFS user and doesn't support proxying to other HDFS users. See [How to access Apache Hadoop](gravitino-server-config.md) for more details. :::info Builds with Apache Iceberg `1.3.1`. The Apache Iceberg table format version is `1` by default. -Builds with Hadoop 2.10.x, there may compatibility issue when accessing Hadoop 3.x clusters. +Builds with Hadoop 2.10.x, there may be compatibility issues when accessing Hadoop 3.x clusters. ::: ## How to start the Gravitino Iceberg REST catalog service @@ -50,7 +50,7 @@ You must set `gravitino.auxService.iceberg-rest.httpPort` explicitly, like `9001 ### Iceberg catalog configuration :::info -The Gravitino Iceberg REST catalog service using memory catalog for default. You can specify Hive or JDBC catalog for production environments. +The Gravitino Iceberg REST catalog service uses the memory catalog by default. You can specify Hive or JDBC catalog for production environments. ::: #### Hive catalog configuration diff --git a/docs/index.md b/docs/index.md index 3f6b9eb0277..82dfbbb1ff4 100644 --- a/docs/index.md +++ b/docs/index.md @@ -25,8 +25,8 @@ your system `PATH`, or the `JAVA_HOME` environment variable pointing to a Java i See [How to install Gravitino](./how-to-install.md) to learn how to install Gravitino server. -Gravitino provides Docker image on [Docker Hub](https://hub.docker.com/u/datastrato). -Please pull the image and run it. For the details of Gravitino Docker image, please see +Gravitino provides Docker images on [Docker Hub](https://hub.docker.com/u/datastrato). +Please pull the image and run it. For the details of the Gravitino Docker image, please see [Dock image details](./docker-image-details.md). Gravitino also provides a playground to experience the whole Gravitino system with other components. @@ -92,9 +92,9 @@ Gravitino supports different catalogs to manage the metadata in different source manage Apache Iceberg data. * [Iceberg REST catalog service](./iceberg-rest-service.md): a complete guide to use Gravitino as an Apache Iceberg REST catalog service. -* [Hive catalog](./apache-hive-catalog.md): a complete guide to using Gravitino manage Apache Hive data. -* [MySQL catalog](./jdbc-mysql-catalog.md): a complete guide to using Gravitino manage MySQL data. -* [PostgreSQL catalog](./jdbc-postgresql-catalog.md): a complete guide to using Gravitino manage PostgreSQL data. +* [Hive catalog](./apache-hive-catalog.md): a complete guide to using Gravitino to manage Apache Hive data. +* [MySQL catalog](./jdbc-mysql-catalog.md): a complete guide to using Gravitino to manage MySQL data. +* [PostgreSQL catalog](./jdbc-postgresql-catalog.md): a complete guide to using Gravitino to manage PostgreSQL data. ### Trino connector @@ -123,9 +123,9 @@ Gravitino provides several ways to configure and manage the Gravitino server. Pl * [How to build Gravitino](./how-to-build.md): a complete guide to build Gravitino from source. -* [How to test Gravitino](./how-to-test.md): a complete guide to run Gravitino unit tests and +* [How to test Gravitino](./how-to-test.md): a complete guide to running Gravitino unit tests and integration tests. -* [How to sign and verify a Gravitino releases](./how-to-sign-releases.md): a guide to sign and verify +* [How to sign and verify Gravitino releases](./how-to-sign-releases.md): a guide to sign and verify a Gravitino release. * [Publish Docker images](./publish-docker-images.md): a guide to publish Gravitino Docker images, also list the change logs of Gravitino CI Docker images and release images. diff --git a/docs/jdbc-mysql-catalog.md b/docs/jdbc-mysql-catalog.md index 4da5310f5ee..47b4669b662 100644 --- a/docs/jdbc-mysql-catalog.md +++ b/docs/jdbc-mysql-catalog.md @@ -51,13 +51,13 @@ Please refer to [Manage Metadata Using Gravitino](./manage-metadata-using-gravit ### Schema capabilities - Gravitino schema corresponds to the MySQL database. -- Support create schema with comments. -- Support drop schema. +- Supports create schema with comments. +- Supports drop schema. - Doesn't support cascade drop database. ### Schema properties -- Doesn't support are database property settings. +- Doesn't support any database property settings. ### Schema operations @@ -122,5 +122,5 @@ You cannot submit the `RenameTable` operation at the same time as other operatio ::: :::caution -If you update a nullability column to non nullability, there may be compatibility issue. +If you update a nullability column to non nullability, there may be compatibility issues. ::: diff --git a/docs/jdbc-postgresql-catalog.md b/docs/jdbc-postgresql-catalog.md index d7b2951a92d..93c89697464 100644 --- a/docs/jdbc-postgresql-catalog.md +++ b/docs/jdbc-postgresql-catalog.md @@ -63,7 +63,7 @@ Please refer to [Manage Metadata Using Gravitino](./manage-metadata-using-gravit ### Schema properties -- Doesn't are schema property settings. +- Doesn't support any schema property settings. ### Schema operations @@ -73,7 +73,7 @@ Please refer to [Manage Metadata Using Gravitino](./manage-metadata-using-gravit ### Table capabilities -- Gravitino table corresponds to the PostgreSQL table. +- The Gravitino table corresponds to the PostgreSQL table. - Supports DDL operation for PostgreSQL tables. - Doesn't support setting certain column properties, such as default value and check constraints. - Doesn't support index definition. @@ -131,5 +131,5 @@ You can't submit the `RenameTable` operation at the same time as other operation :::caution PostgreSQL doesn't support the `UpdateColumnPosition` operation, so you can only use `ColumnPosition.defaultPosition()` when `AddColumn`. -If you update a nullability column to non nullability, there may be compatibility issue. +If you update a nullability column to non nullability, there may be compatibility issues. ::: diff --git a/docs/lakehouse-iceberg-catalog.md b/docs/lakehouse-iceberg-catalog.md index fd313f30072..a69e7abc623 100644 --- a/docs/lakehouse-iceberg-catalog.md +++ b/docs/lakehouse-iceberg-catalog.md @@ -17,7 +17,7 @@ Gravitino provides the ability to manage Apache Iceberg metadata. :::info Builds with Apache Iceberg `1.3.1`. The Apache Iceberg table format version is `1` by default. -Builds with Hadoop 2.10.x, there may compatibility issue when accessing Hadoop 3.x clusters. +Builds with Hadoop 2.10.x, there may be compatibility issues when accessing Hadoop 3.x clusters. ::: ## Catalog @@ -89,7 +89,7 @@ Supports transforms: :::info Iceberg doesn't support multi fields in `BucketTransform`. -Iceberg doesn't support `ApplyTransform`, `RangeTransform` and `ListTransform`. +Iceberg doesn't support `ApplyTransform`, `RangeTransform`, and `ListTransform`. ::: ### Table sort orders @@ -106,7 +106,7 @@ supports expressions: - `hour` :::info -For `bucket` and `truncate`, the first argument must be integer literal, the second argument must be field reference. +For `bucket` and `truncate`, the first argument must be integer literal, and the second argument must be field reference. ::: ### Table distributions @@ -145,7 +145,7 @@ Apache Iceberg doesn't support Gravitino `Varchar` `Fixedchar` `Byte` `Short` `U ### Table properties -You can pass [Iceberg table properties](https://iceberg.apache.org/docs/1.3.1/configuration/) to Gravitino when creating Iceberg table. +You can pass [Iceberg table properties](https://iceberg.apache.org/docs/1.3.1/configuration/) to Gravitino when creating an Iceberg table. The Gravitino server doesn't allow passing the following reserved fields. diff --git a/docs/manage-metadata-using-gravitino.md b/docs/manage-metadata-using-gravitino.md index 0dd7e38158d..4e5f41537e6 100644 --- a/docs/manage-metadata-using-gravitino.md +++ b/docs/manage-metadata-using-gravitino.md @@ -13,7 +13,7 @@ This page introduces how to manage metadata by Gravitino. Through Gravitino, you like metalakes, catalogs, schemas, and tables. This page includes the following contents: In this document, Gravitino uses Apache Hive catalog as an example to show how to manage metadata by Gravitino. Other catalogs are similar to Hive catalog, -but they may have some differences, especially in catalog property, table property and column type. For more details, please refer to the related doc. +but they may have some differences, especially in catalog property, table property, and column type. For more details, please refer to the related doc. - [**Apache Hive**](./apache-hive-catalog.md) - [**MySQL**](./jdbc-postgresql-catalog.md) @@ -21,7 +21,7 @@ but they may have some differences, especially in catalog property, table proper - [**Apache Iceberg**](./lakehouse-iceberg-catalog.md) -Assuming Gravitino has just started, and the host and port is `http://localhost:8090`. +Assuming Gravitino has just started, and the host and port is . ## Metalake operations @@ -158,12 +158,12 @@ boolean success = gravitinoClient.dropMetalake( :::note -Drop a metalake only removes metadata about the metalake and catalogs, schemas, tables under the metalake in Gravitino, It doesn't remove the real schema and table data in Apache Hive. +Dropping a metalake only removes metadata about the metalake and catalogs, schemas, tables under the metalake in Gravitino, It doesn't remove the real schema and table data in Apache Hive. ::: ### List all metalakes -You can list metalakes by sending a `GET` request to the `/api/metalakes` endpoint or just use the Gravitino Java client. The following is an example of listing all metalake name: +You can list metalakes by sending a `GET` request to the `/api/metalakes` endpoint or just use the Gravitino Java client. The following is an example of listing all metalake names: @@ -185,7 +185,7 @@ GravitinoMetaLake[] allMetalakes = gravitinoClient.listMetalakes(); -## Catalogs operations +## Catalog operations ### Create a catalog @@ -235,7 +235,7 @@ Catalog catalog = gravitinoMetaLake.createCatalog( Type.RELATIONAL, "hive", // provider, We support hive, jdbc-mysql, jdbc-postgresql, lakehouse-iceberg, etc. "This is a hive catalog", - hiveProperties); // Please change the properties according to the value of provider. + hiveProperties); // Please change the properties according to the value of the provider. // ... ``` @@ -359,7 +359,7 @@ gravitinoMetaLake.dropCatalog(NameIdentifier.of("metalake", "catalog")); :::note -Drop a catalog only removes metadata about the catalog and schemas, tables under the catalog in Gravitino, It doesn't remove the real data (table and schema) in Apache Hive. +Dropping a catalog only removes metadata about the catalog, schemas, and tables under the catalog in Gravitino, It doesn't remove the real data (table and schema) in Apache Hive. ::: ### List all catalogs in a metalake @@ -392,7 +392,7 @@ NameIdentifier[] catalogsIdents = gravitinoMetaLake.listCatalogs(Namespace.ofCat -## Schemas operations +## Schema operations :::tip Users should create a metalake and a catalog before creating a schema. @@ -593,7 +593,7 @@ NameIdentifier[] schemas = supportsSchemas.listSchemas(Namespace.ofSchema("metal -## Tables operations +## Table operations :::tip Users should create a metalake, a catalog and a schema before creating a table. @@ -711,7 +711,7 @@ The following types that Gravitino supports: | List | `Types.ListType.of(elementType, elementNullable)` | `{"type": "list", "containsNull": JSON Boolean, "elementType": type JSON}` | List type, indicate a list of elements with the same type | | Map | `Types.MapType.of(keyType, valueType)` | `{"type": "map", "keyType": type JSON, "valueType": type JSON, "valueContainsNull": JSON Boolean}` | Map type, indicate a map of key-value pairs | | Struct | `Types.StructType.of([Types.StructType.Field.of(name, type, nullable)])` | `{"type": "struct", "fields": [JSON StructField, {"name": string, "type": type JSON, "nullable": JSON Boolean, "comment": string}]}` | Struct type, indicate a struct of fields | -| Union | `Types.UnionType.of([type1, type2, ...])` | `{"type": "union", "types": [type JSON, ...]}` | Union type, indicate a union of types +| Union | `Types.UnionType.of([type1, type2, ...])` | `{"type": "union", "types": [type JSON, ...]}` | Union type, indicates a union of types The related java doc is [here](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/types/Type.html). @@ -734,13 +734,13 @@ In addition to the basic settings, Gravitino supports the following features: |---------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------| | Table partitioning | Equal to `PARTITION BY` in Apache Hive, It is a partitioning strategy that is used to split a table into parts based on partition keys. Some table engine may not support this feature | [Partition](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/dto/rel/partitions/Partitioning.html) | | Table bucketing | Equal to `CLUSTERED BY` in Apache Hive, Bucketing a.k.a (Clustering) is a technique to split the data into more manageable files/parts, (By specifying the number of buckets to create). The value of the bucketing column will be hashed by a user-defined number into buckets. | [Distribution](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/distributions/Distribution.html) | -| Table sort ordering | Equal to `SORTED BY` in Apache Hive, sort ordering is a method to sort the data by specific ways such as by a column or a function and then store table data. it will highly improve the query performance under certain scenarios. | [SortOrder](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/sorts/SortOrder.html) | +| Table sort ordering | Equal to `SORTED BY` in Apache Hive, sort ordering is a method to sort the data in specific ways such as by a column or a function, and then store table data. it will highly improve the query performance under certain scenarios. | [SortOrder](pathname:///docs/0.3.0/api/java/com/datastrato/gravitino/rel/expressions/sorts/SortOrder.html) | For more information, please see the related document on [partitioning, bucketing, and sorting](table-partitioning-bucketing-sort-order.md). :::note -The code above is an example of creating a Hive table. For other catalogs, the code is similar, but the supported column type, table properties may be different. For more details, please refer to the related doc. +The code above is an example of creating a Hive table. For other catalogs, the code is similar, but the supported column type, and table properties may be different. For more details, please refer to the related doc. ::: ### Load a table @@ -837,7 +837,7 @@ You can remove a table by sending a `DELETE` request to the `/api/metalakes/{met ```shell -## purge can be true or false, if purge is true, Gravitino will remove the data of the table. +## Purge can be true or false, if purge is true, Gravitino will remove the data from the table. curl -X DELETE -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" \ @@ -865,8 +865,8 @@ tableCatalog.purgeTable(NameIdentifier.of("metalake", "catalog", "schema", "tabl -There are two ways to drop a table: `dropTable` and `purgeTable`, the difference between them is that `purgeTable` will remove data of the table, while `dropTable` only removes the metadata of the table. Some engine such as -Apache Hive support both, `dropTable` will only remove the metadata of a table and the data in HDFS can be reused later through the format of external table. +There are two ways to drop a table: `dropTable` and `purgeTable`, the difference between them is that `purgeTable` will remove data of the table, while `dropTable` only removes the metadata of the table. Some engines such as +Apache Hive support both, `dropTable` will only remove the metadata of a table and the data in HDFS can be reused later through the format of the external table. ### List all tables under a schema diff --git a/docs/metrics.md b/docs/metrics.md index d242836e038..2f0872193e2 100644 --- a/docs/metrics.md +++ b/docs/metrics.md @@ -15,7 +15,7 @@ Gravitino Metrics builds upon the [Dropwizard Metrics](https://metrics.dropwizar // Use Gravitino Server address or Iceberg REST server address to replace 127.0.0.1:8090 // Get metrics in JSON format curl http://127.0.0.1:8090/metrics -// Get metrics in Promethus format +// Get metrics in Prometheus format curl http://127.0.0.1:8090/prometheus/metrics ``` @@ -48,5 +48,5 @@ Metrics with the `gravitino-server` prefix pertain to the Gravitino server, whil #### JVM metrics -JVM metrics source uses [JVM instrumentation](https://metrics.dropwizard.io/4.2.0/manual/jvm.html) with BufferPoolMetricSet, GarbageCollectorMetricSet and MemoryUsageGaugeSet. +JVM metrics source uses [JVM instrumentation](https://metrics.dropwizard.io/4.2.0/manual/jvm.html) with BufferPoolMetricSet, GarbageCollectorMetricSet, and MemoryUsageGaugeSet. These metrics start with the `jvm` prefix, like `jvm.heap.used` in JSON format, `jvm_head_used` in Prometheus format. diff --git a/docs/overview.md b/docs/overview.md index 5b124a3418d..ee5476688e7 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -23,9 +23,9 @@ Gravitino aims to provide several key features: ![Gravitino Model and Arch](assets/gravitino-model-arch.png) -* **Functionality Layer**: Gravitino provides a set of APIs for users to manage and govern the +* **Functionality Layer**: Gravitino provides an API for users to manage and govern the metadata, including standard metadata creation, update, and delete operations. In the meantime, it also provides the ability to govern the metadata in a unified way, including access control, discovery, and others. -* **Interface Layer**: Gravitino provides standard REST APIs as the interface layer for users. Future support includes Thrift and JDBC interfaces. +* **Interface Layer**: Gravitino provides a standard REST API as the interface layer for users. Future support includes Thrift and JDBC interfaces. * **Core Object Model**: Gravitino defines a generic metadata model to represent the metadata in different sources and types and manages them in a unified way. * **Connection Layer**: In the connection layer, Gravitino provides a set of connectors to connect to different metadata sources, including Apache Hive, MySQL, PostgreSQL, and others. It also allows connecting and managing heterogeneous metadata other than Tabular data. @@ -43,7 +43,7 @@ others. ### Direct metadata management -Unlike the traditional metadata management systems, which need to collect the metadata +Unlike traditional metadata management systems, which need to collect the metadata actively or passively from underlying systems, Gravitino manages these systems directly. It provides a set of connectors to connect to different metadata sources. The changes in Gravitino directly reflect in the underlying systems, and vice versa. @@ -78,10 +78,10 @@ assets like models, features, and others are under development. * **Metalake**: The top-level container for metadata. Typically, one group has one metalake to manage all the metadata in it. Each metalake exposes a three-level namespace(catalog.schema. table) to organize the data. -* **Catalog**: catalog is a collection of metadata from a specific metadata source. +* **Catalog**: A catalog is a collection of metadata from a specific metadata source. Each catalog has a related connector to connect to the specific metadata source. -* **Schema**: Schema is equivalent to a database, Schemas only exist in the specific catalogs +* **Schema**: A schema is equivalent to a database, Schemas only exist in the specific catalogs that support relational metadata sources, such as Apache Hive, MySQL, PostgreSQL, and others. * **Table**: The lowest level in the object hierarchy for catalogs that support relational metadata sources. You can create Tables in specific schemas in the catalogs. -* **Model**: Model represents the metadata in the specific catalogs that support model management. +* **Model**: The model represents the metadata in the specific catalogs that support model management. diff --git a/docs/publish-docker-images.md b/docs/publish-docker-images.md index 4fc4f8c4027..913a0f9b5fd 100644 --- a/docs/publish-docker-images.md +++ b/docs/publish-docker-images.md @@ -25,7 +25,7 @@ You can use GitHub actions to publish Docker images to the Docker Hub repository + `datastrato/gravitino-ci-hive`. + `datastrato/gravitino-ci-trino`. + Future plans include support for other data sources. -5. Input the `tag name`, for example: `0.1.0`, Then build and push the Docker image name is `datastrato/{image-name}:0.1.0`. +5. Input the `tag name`, for example: `0.1.0`, Then build and push the Docker image name as `datastrato/{image-name}:0.1.0`. 6. You must enter the correct `publish docker token` before you can execute run `Publish Docker Image` workflow. 7. Wait for the workflow to complete. You can see a new Docker image shown in the [Datastrato Docker Hub](https://hub.docker.com/u/datastrato) repository. diff --git a/docs/security.md b/docs/security.md index 3c8b095ff0b..8289d9f6341 100644 --- a/docs/security.md +++ b/docs/security.md @@ -92,7 +92,7 @@ You can follow the steps to set up an OAuth mode Gravitino server. There is a sample-authorization-server based on [spring-authorization-server](https://github.com/spring-projects/spring-authorization-server/tree/1.0.3). - The image has registered a client information in the external OAuth 2.0 server. + The image has registered client information in the external OAuth 2.0 server. Its clientId is `test`. Its secret is `test`. Its scope is `test`. @@ -120,11 +120,11 @@ gravitino.authenticator.oauth.tokenPath /oauth2/token gravitino.authenticator.oauth.serverUri http://localhost:8177 ``` -7. Open [the URL of Gravitino server](http://localhost:8090) and login in with clientId `test`, clientSecret `test` and scope `test`. +7. Open [the URL of Gravitino server](http://localhost:8090) and login in with clientId `test`, clientSecret `test`, and scope `test`. ![oauth_login_image](assets/oauth.png) -8. You can also use curl command to access Gravitino. +8. You can also use the curl command to access Gravitino. Get access token @@ -146,7 +146,7 @@ HTTPS protects the header of the request from smuggling, making it safer. If users choose to enable HTTPS, Gravitino won't provide the ability of HTTP service. -Both Gravitino server and Iceberg REST service can configure HTTPS. +Both the Gravitino server and Iceberg REST service can configure HTTPS. ### Gravitino server's configuration @@ -220,8 +220,8 @@ bin/keytool -export -alias localhost -keystore localhost.jks -file localhost.cr bin/keytool -import -alias localhost -keystore jre/lib/security/cacerts -file localhost.crt -storepass changeit -noprompt ``` -5. You can refer to the [Configurations](gravitino-server-config.md) and append the configurations to the conf/gravitino.conf. -Configuration doesn't support to resolve environment variable, so you should replace `${JAVA_HOME}` with the actual value. +5. You can refer to the [Configurations](gravitino-server-config.md) and append the configuration to the conf/gravitino.conf. +Configuration doesn't support resolving environment variables, so you should replace `${JAVA_HOME}` with the actual value. Then, You can start the Gravitino server. ```text @@ -258,3 +258,32 @@ If you want to use the command `curl`, you can follow the commands: openssl x509 -inform der -in $JAVA_HOME/localhost.crt -out certificate.pem curl -v -X GET --cacert ./certificate.pem -H "Accept: application/vnd.gravitino.v1+json" -H "Content-Type: application/json" https://localhost:8433/api/version ``` +## Cross-origin resource filter + +### Server configuration + +| Configuration item | Description | Default value | Required | Since version | +|----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|----------|---------------| +| `gravitino.server.webserver.enableCorsFilter` | Enable cross-origin resource share filter. | false | No | 0.4.0 | +| `gravitino.server.webserver.allowedOrigins` | A comma separated list of allowed origins to access the resources. The default value is *, which means all origins. | `*` | No | 0.4.0 | +| `gravitino.server.webserver.allowedTimingOrigins` | A comma separated list of allowed origins to time the resource. The default value is the empty string, which means no origins. | `` | No | 0.4.0 | +| `gravitino.server.webserver.allowedMethods` | A comma separated list of allowed HTTP methods used when accessing the resources. The default values are GET, POST, HEAD, and DELETE. | `GET,POST,HEAD,DELETE` | No | 0.4.0 | +| `gravitino.server.webserver.allowedHeaders` | A comma separated list of allowed HTTP headers specified when accessing the resources. The default value is X-Requested-With,Content-Type,Accept,Origin. If the value is a single *, it accepts all headers. | `X-Requested-With,Content-Type,Accept,Origin` | No | 0.4.0 | +| `gravitino.server.webserver.preflightMaxAgeInSecs` | The number of seconds to cache preflight requests by the client. The default value is 1800 seconds or 30 minutes. | `1800` | No | 0.4.0 | +| `gravitino.server.webserver.allowCredentials` | A boolean indicating if the resource allows requests with credentials. The default value is true. | `true` | No | 0.4.0 | +| `gravitino.server.webserver.exposedHeaders` | A comma separated list of allowed HTTP headers exposed on the client. The default value is the empty list. | `` | No | 0.4.0 | +| `gravitino.server.webserver.chainPreflight` | If true chained preflight requests for normal handling (as an OPTION request). Otherwise, the filter responds to the preflight. The default is true. | `true` | No | 0.4.0 | + +### Iceberg REST service's configuration + +| Configuration item | Description | Default value | Required | Since version | +|-----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|----------|---------------| +| `gravitino.auxService.iceberg-rest.enableCorsFilter` | Enable cross-origin resource share filter. | false | No | 0.4.0 | +| `gravitino.auxService.iceberg-rest.allowedOrigins` | A comma separated list of allowed origins that access the resources. The default value is *, which means all origins. | `*` | No | 0.4.0 | +| `gravitino.auxService.iceberg-rest.allowedTimingOrigins` | A comma separated list of allowed origins that time the resource. The default value is the empty string, which means no origins. | `` | No | 0.4.0 | +| `gravitino.auxService.iceberg-rest.allowedMethods` | A comma separated list of allowed HTTP methods used when accessing the resources. The default values are GET, POST, HEAD, and DELETE. | `GET,POST,HEAD,DELETE` | No | 0.4.0 | +| `gravitino.auxService.iceberg-rest.allowedHeaders` | A comma separated list of HTTP allowed headers specified when accessing the resources. The default value is X-Requested-With,Content-Type,Accept,Origin. If the value is a single *, it accepts all headers.| `X-Requested-With,Content-Type,Accept,Origin` | No | 0.4.0 | +| `gravitino.auxService.iceberg-rest.preflightMaxAgeInSecs` | The number of seconds to cache preflight requests by the client. The default value is 1800 seconds or 30 minutes. | `1800` | No | 0.4.0 | +| `gravitino.auxService.iceberg-rest.allowCredentials` | A boolean indicating if the resource allows requests with credentials. The default value is true. | `true` | No | 0.4.0 | +| `gravitino.auxService.iceberg-rest.exposedHeaders` | A comma separated list of allowed HTTP headers exposed on the client. The default value is the empty list. | `` | No | 0.4.0 | +| `gravitino.auxService.iceberg-rest.chainPreflight` | If true chained preflight requests for normal handling (as an OPTION request). Otherwise, the filter responds to the preflight. The default is true. | `true` | No | 0.4.0 | diff --git a/docs/table-partitioning-bucketing-sort-order.md b/docs/table-partitioning-bucketing-sort-order.md index 638e32f9a73..ae3496cb812 100644 --- a/docs/table-partitioning-bucketing-sort-order.md +++ b/docs/table-partitioning-bucketing-sort-order.md @@ -13,7 +13,7 @@ import TabItem from '@theme/TabItem'; To create a partitioned table, you should provide the following two components to construct a valid partitioned table. -- Partitioning strategy. It defines how Gravitino will distribute table data across partitions. currently Gravitino supports the following partitioning strategies. +- Partitioning strategy. It defines how Gravitino distributes table data across partitions. Currently, Gravitino supports the following partitioning strategies. :::note The `score`, `createTime`, and `city` appearing in the table below refer to the field names in a table. @@ -36,7 +36,7 @@ For complex functions, please refer to [FunctionPartitioningDTO](https://github. - Field names: It defines which fields Gravitino uses to partition the table. -- Other messages may also be needed. For example, if the partitioning strategy is `bucket`, you should provide the number of buckets; if the partitioning strategy is `truncate`, you should provide the width of the truncate. +- In some cases, you require other information. For example, if the partitioning strategy is `bucket`, you should provide the number of buckets; if the partitioning strategy is `truncate`, you should provide the width of the truncate. The following is an example of creating a partitioned table: @@ -72,18 +72,18 @@ new Transform[] { To create a bucketed table, you should use the following three components to construct a valid bucketed table. -- Strategy. It defines how Gravitino will distribute table data across partitions. +- Strategy. It defines how Gravitino distributes table data across partitions. | Bucket strategy | Description | JSON | Java | |-----------------|-------------------------------------------------------------------------------------------------------------------------------|----------|------------------| -| hash | Bucket table using hash. Gravitino will distribute table data into buckets based on the hash value of the key. | `hash` | `Strategy.HASH` | -| range | Bucket table using range. Gravitino will distribute table data into buckets based on a specified range or interval of values. | `range` | `Strategy.RANGE` | -| even | Bucket table using even. Gravitino will distribute table data, ensuring an equal distribution of data. | `even` | `Strategy.EVEN` | +| hash | Bucket table using hash. Gravitino distributes table data into buckets based on the hash value of the key. | `hash` | `Strategy.HASH` | +| range | Bucket table using range. Gravitino distributes table data into buckets based on a specified range or interval of values. | `range` | `Strategy.RANGE` | +| even | Bucket table using even. Gravitino distributes table data, ensuring an equal distribution of data. | `even` | `Strategy.EVEN` | - Number. It defines how many buckets you use to bucket the table. -- Function arguments. It defines the arguments of the strategy above, Gravitino supports the following three kinds of arguments, for more, you can refer to Java class [FunctionArg](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/expressions/FunctionArg.java) and [DistributionDTO](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/DistributionDTO.java) to use more complex function arguments. +- Function arguments. It defines the arguments of the strategy, Gravitino supports the following three kinds of arguments, for more, you can refer to Java class [FunctionArg](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/expressions/FunctionArg.java) and [DistributionDTO](https://github.com/datastrato/gravitino/blob/main/common/src/main/java/com/datastrato/gravitino/dto/rel/DistributionDTO.java) to use more complex function arguments. -| Expression type | JSON example | Java example | Equivalent SQL semantics | Description | +| Expression type | JSON example | Java example | Equivalent SQL semantics | Description | |-----------------|----------------------------------------------------------------|-------------------------------------------------------------------------------------------|--------------------------|-----------------------------------| | field | `{"type":"field","fieldName":["score"]}` | `FieldReferenceDTO.of("score")` | `score` | The field reference value `score` | | function | `{"type":"function","functionName":"hour","fieldName":["dt"]}` | `new FuncExpressionDTO.Builder().withFunctionName("hour").withFunctionArgs("dt").build()` | `hour(dt)` | The function value `hour(dt)` | diff --git a/docs/webui.md b/docs/webui.md index cf607636c4f..4414e092d1e 100644 --- a/docs/webui.md +++ b/docs/webui.md @@ -11,7 +11,7 @@ import Image from '@theme/IdealImage' import Tabs from '@theme/Tabs' import TabItem from '@theme/TabItem' -This document primarily outlines how users can manage metadata within Gravitino using the web UI, the graphical interface is accessible through a web browser as an alterative to writing code or using the REST interface. +This document primarily outlines how users can manage metadata within Gravitino using the web UI, the graphical interface is accessible through a web browser as an alternative to writing code or using the REST interface. Currently, you can integrate [OAuth settings](security.md) to view, add, modify, and delete metalakes, create catalogs, and view catalogs, schemas, and tables, among other functions. @@ -35,11 +35,11 @@ After changing the configuration, make sure to restart the Gravitino server. ### Simple mode -``` +```text gravitino.authenticator = simple ``` -Set the configuration parameter `gravitino.authenticator` to `simple`, the web UI displays the homepage (Metalakes). +Set the configuration parameter `gravitino.authenticator` to `simple`, and the web UI displays the homepage (Metalakes). ![webui-metalakes-simple](assets/webui/metalakes-simple.png) @@ -49,11 +49,11 @@ The main content displays the existing metalake list. ### Oauth mode -``` +```text gravitino.authenticator = oauth ``` -Set the configuration parameter `gravitino.authenticator` to `oauth`, the web UI displays the login page. +Set the configuration parameter `gravitino.authenticator` to `oauth`, and the web UI displays the login page. ![webui-login-with-oauth](assets/webui/login-with-oauth.png) @@ -81,7 +81,7 @@ Creating a metalake needs these fields: 1. **Name**(**_required_**): the name of the metalake. 2. **Comment**(_optional_): the comment of the metalake. -3. **Properties**(_optional_): clicking on the `ADD PROPERTY` button to add custom properties. +3. **Properties**(_optional_): Click on the `ADD PROPERTY` button to add custom properties. ![metalake-list](assets/webui/metalake-list.png) @@ -95,7 +95,7 @@ There are 3 actions you can perform on a metalake. #### Edit metalake -Displays a dialog for for modifying a metalakes fields. +Displays a dialog for modifying metalakes fields. ![create-metalake-dialog](assets/webui/create-metalake-dialog.png) @@ -113,7 +113,7 @@ If this is the first time, it shows no data until after creating a catalog. ![metalake-catalogs](assets/webui/metalake-catalogs.png) -Clicking on the Tab - `DETAILS` views the details of the catalog in the metalake catalogs page. +Clicking on the Tab - `DETAILS` views the details of the catalog on the metalake catalogs page. ![metalake-catalogs-details](assets/webui/metalake-catalogs-details.png)