From 33685addbdbada3329795b86eb20b9a077009559 Mon Sep 17 00:00:00 2001 From: Joel Goode Date: Wed, 11 Sep 2024 12:41:43 -0400 Subject: [PATCH] Docs changes for Hadoop FS default --- docs/src/main/sphinx/connector/delta-lake.md | 23 +++++++------- docs/src/main/sphinx/connector/hive.md | 30 +++++++++++-------- docs/src/main/sphinx/connector/hudi.md | 22 ++++++++------ docs/src/main/sphinx/connector/iceberg.md | 24 ++++++++------- docs/src/main/sphinx/object-storage.md | 23 +++++++------- .../object-storage/file-system-azure.md | 5 ++-- .../sphinx/object-storage/file-system-gcs.md | 6 ++-- .../sphinx/object-storage/file-system-hdfs.md | 8 ++--- .../sphinx/object-storage/file-system-s3.md | 5 ++-- .../sphinx/object-storage/legacy-azure.md | 8 +++++ .../main/sphinx/object-storage/legacy-cos.md | 7 +++++ .../main/sphinx/object-storage/legacy-gcs.md | 7 +++++ .../main/sphinx/object-storage/legacy-s3.md | 7 +++++ 13 files changed, 108 insertions(+), 67 deletions(-) diff --git a/docs/src/main/sphinx/connector/delta-lake.md b/docs/src/main/sphinx/connector/delta-lake.md index 3f8c61886c62..c116f5ad88e0 100644 --- a/docs/src/main/sphinx/connector/delta-lake.md +++ b/docs/src/main/sphinx/connector/delta-lake.md @@ -27,16 +27,21 @@ To connect to Databricks Delta Lake, you need: ## General configuration To configure the Delta Lake connector, create a catalog properties file -`etc/catalog/example.properties` that references the `delta_lake` -connector and defines a metastore. You must configure a metastore for table -metadata. If you are using a {ref}`Hive metastore `, -`hive.metastore.uri` must be configured: +`etc/catalog/example.properties` that references the `delta_lake` connector. + +You must configure a [metastore for metadata](/object-storage/metastores). + +You must select and configure one of the [supported file +systems](delta-lake-file-system-configuration). ```properties connector.name=delta_lake hive.metastore.uri=thrift://example.net:9083 +fs.x.enabled=true ``` +Replace the `fs.x.enabled` configuration property with the desired file system. + If you are using {ref}`AWS Glue ` as your metastore, you must instead set `hive.metastore` to `glue`: @@ -55,17 +60,15 @@ visible to the connector. (delta-lake-file-system-configuration)= ## File system access configuration -The connector supports native, high-performance file system access to object -storage systems: +The connector supports accessing the following file systems: -* [](/object-storage) * [](/object-storage/file-system-azure) * [](/object-storage/file-system-gcs) * [](/object-storage/file-system-s3) +* [](/object-storage/file-system-hdfs) -You must enable and configure the specific native file system access. If none is -activated, the [legacy support](file-system-legacy) is used and must be -configured. +You must enable and configure the specific file system access. [Legacy +support](file-system-legacy) is not recommended and will be removed. ### Delta Lake general configuration properties diff --git a/docs/src/main/sphinx/connector/hive.md b/docs/src/main/sphinx/connector/hive.md index c2bfa23fef51..bc2ccc20c6dc 100644 --- a/docs/src/main/sphinx/connector/hive.md +++ b/docs/src/main/sphinx/connector/hive.md @@ -27,8 +27,8 @@ The Hive connector requires a implementation of the Hive metastore, such as {ref}`AWS Glue `. -Many [distributed storage systems](hive-file-system-configuration) can be -queried with the Hive connector. +You must select and configure a [supported +file system](hive-file-system-configuration) in your catalog configuration file. The coordinator and all workers must have network access to the Hive metastore and the storage system. Hive metastore access with the Thrift protocol defaults @@ -58,16 +58,22 @@ In the case of serializable formats, only specific ## General configuration To configure the Hive connector, create a catalog properties file -`etc/catalog/example.properties` that references the `hive` -connector and defines a metastore. You must configure a metastore for table -metadata. If you are using a {ref}`Hive metastore `, -`hive.metastore.uri` must be configured: +`etc/catalog/example.properties` that references the `hive` connector. + +You must configure a [metastore for metadata](/object-storage/metastores). + +You must select and configure one of the [supported file +systems](hive-file-system-configuration). + ```properties connector.name=hive hive.metastore.uri=thrift://example.net:9083 +fs.x.enabled=true ``` +Replace the `fs.x.enabled` configuration property with the desired file system. + If you are using {ref}`AWS Glue ` as your metastore, you must instead set `hive.metastore` to `glue`: @@ -77,7 +83,7 @@ hive.metastore=glue ``` Each metastore type has specific configuration properties along with -{ref}`general metastore configuration properties `. +[](general-metastore-properties). ### Multiple Hive clusters @@ -290,17 +296,15 @@ Hive connector documentation. (hive-file-system-configuration)= ### File system access configuration -The connector supports native, high-performance file system access to object -storage systems: +The connector supports accessing the following file systems: -* [](/object-storage) * [](/object-storage/file-system-azure) * [](/object-storage/file-system-gcs) * [](/object-storage/file-system-s3) +* [](/object-storage/file-system-hdfs) -You must enable and configure the specific native file system access. If none is -activated, the [legacy support](file-system-legacy) is used and must be -configured. +You must enable and configure the specific file system access. [Legacy +support](file-system-legacy) is not recommended and will be removed. (hive-fte-support)= ### Fault-tolerant execution support diff --git a/docs/src/main/sphinx/connector/hudi.md b/docs/src/main/sphinx/connector/hudi.md index b45345ad6a91..d825511cd14e 100644 --- a/docs/src/main/sphinx/connector/hudi.md +++ b/docs/src/main/sphinx/connector/hudi.md @@ -20,15 +20,21 @@ To use the Hudi connector, you need: ## General configuration To configure the Hudi connector, create a catalog properties file -`etc/catalog/example.properties` that references the `hudi` -connector and defines the HMS to use with the `hive.metastore.uri` -configuration property: +`etc/catalog/example.properties` that references the `hudi` connector. + +You must configure a [metastore for table metadata](/object-storage/metastores). + +You must select and configure one of the [supported file +systems](hudi-file-system-configuration). ```properties connector.name=hudi hive.metastore.uri=thrift://example.net:9083 +fs.x.enabled=true ``` +Replace the `fs.x.enabled` configuration property with the desired file system. + There are {ref}`HMS configuration properties ` available for use with the Hudi connector. The connector recognizes Hudi tables synced to the metastore by the [Hudi sync tool](https://hudi.apache.org/docs/syncing_metastore). @@ -96,17 +102,15 @@ Additionally, following configuration properties can be set depending on the use (hudi-file-system-configuration)= ## File system access configuration -The connector supports native, high-performance file system access to object -storage systems: +The connector supports accessing the following file systems: -* [](/object-storage) * [](/object-storage/file-system-azure) * [](/object-storage/file-system-gcs) * [](/object-storage/file-system-s3) +* [](/object-storage/file-system-hdfs) -You must enable and configure the specific native file system access. If none is -activated, the [legacy support](file-system-legacy) is used and must be -configured. +You must enable and configure the specific file system access. [Legacy +support](file-system-legacy) is not recommended and will be removed. ## SQL support diff --git a/docs/src/main/sphinx/connector/iceberg.md b/docs/src/main/sphinx/connector/iceberg.md index 530f490974d1..55f0eaf959a4 100644 --- a/docs/src/main/sphinx/connector/iceberg.md +++ b/docs/src/main/sphinx/connector/iceberg.md @@ -49,16 +49,22 @@ To use Iceberg, you need: ## General configuration To configure the Iceberg connector, create a catalog properties file -`etc/catalog/example.properties` that references the `iceberg` -connector and defines a metastore type. The Hive metastore catalog is the -default implementation. To use a {ref}`Hive metastore `, -`hive.metastore.uri` must be configured: +`etc/catalog/example.properties` that references the `iceberg` connector. + +The [Hive metastore catalog](hive-thrift-metastore) is the default +implementation. + +You must select and configure one of the [supported file +systems](iceberg-file-system-configuration). ```properties connector.name=iceberg hive.metastore.uri=thrift://example.net:9083 +fs.x.enabled=true ``` +Replace the `fs.x.enabled` configuration property with the desired file system. + Other metadata catalog types as listed in the requirements section of this topic are available. Each metastore type has specific configuration properties along with {ref}`general metastore configuration properties @@ -201,17 +207,15 @@ processing. Read and write operations are both supported with any retry policy. (iceberg-file-system-configuration)= ## File system access configuration -The connector supports native, high-performance file system access to object -storage systems: +The connector supports accessing the following file systems: -* [](/object-storage) * [](/object-storage/file-system-azure) * [](/object-storage/file-system-gcs) * [](/object-storage/file-system-s3) +* [](/object-storage/file-system-hdfs) -You must enable and configure the specific native file system access. If none is -activated, the [legacy support](file-system-legacy) is used and must be -configured. +You must enable and configure the specific file system access. [Legacy +support](file-system-legacy) is not recommended and will be removed. ## Type mapping diff --git a/docs/src/main/sphinx/object-storage.md b/docs/src/main/sphinx/object-storage.md index 8fc3a5a8d7c4..26c988179ad1 100644 --- a/docs/src/main/sphinx/object-storage.md +++ b/docs/src/main/sphinx/object-storage.md @@ -24,8 +24,10 @@ availability. (file-system-configuration)= ## Configuration -Use the following properties to control the support for different file systems -in a catalog. Each catalog can only use one file system. +By default, no file system support is activated for your catalog. You must +select and configure one of the following properties to determine the support +for different file systems in the catalog. Each catalog can only use one file +system support. :::{list-table} File system support properties :widths: 35, 65 @@ -33,22 +35,19 @@ in a catalog. Each catalog can only use one file system. * - Property - Description -* - `fs.hadoop.enabled` - - Activate the [legacy libraries and implementation based on the Hadoop](file-system-legacy) - ecosystem. Defaults to `true`. * - `fs.native-azure.enabled` - Activate the [native implementation for Azure Storage - support](/object-storage/file-system-azure), and deactivate all [legacy - support](file-system-legacy). Defaults to `false`. + support](/object-storage/file-system-azure). Defaults to `false`. * - `fs.native-gcs.enabled` - Activate the [native implementation for Google Cloud Storage - support](/object-storage/file-system-gcs), and deactivate all [legacy - support](file-system-legacy). Defaults to `false`. + support](/object-storage/file-system-gcs). Defaults to `false`. * - `fs.native-s3.enabled` - Activate the [native implementation for S3 storage - support](/object-storage/file-system-s3), and deactivate all [legacy - support](file-system-legacy) . Defaults to `false`. - + support](/object-storage/file-system-s3). Defaults to `false`. +* - `fs.hadoop.enabled` + - Activate [support for HDFS](/object-storage/file-system-hdfs) and [legacy + support for other file systems](file-system-legacy) using the HDFS + libraries. Defaults to `false`. ::: (file-system-native)= diff --git a/docs/src/main/sphinx/object-storage/file-system-azure.md b/docs/src/main/sphinx/object-storage/file-system-azure.md index fa8da1c62ffc..b82da2cafbc0 100644 --- a/docs/src/main/sphinx/object-storage/file-system-azure.md +++ b/docs/src/main/sphinx/object-storage/file-system-azure.md @@ -19,9 +19,8 @@ system support: * - Property - Description * - `fs.native-azure.enabled` - - Activate the native implementation for Azure Storage support, and deactivate - all [legacy support](file-system-legacy). Defaults to `false`. - Must be set to `true` for all other properties be used. + - Activate the native implementation for Azure Storage support. Defaults to + `false`. Set to `true` to use Azure Storage and enable all other properties. * - `azure.auth-type` - Authentication type to use for Azure Storage access. Defaults no authentication used with `NONE`. Use `ACCESS_KEY` for diff --git a/docs/src/main/sphinx/object-storage/file-system-gcs.md b/docs/src/main/sphinx/object-storage/file-system-gcs.md index 469f8e8c4aee..f1907f0082cf 100644 --- a/docs/src/main/sphinx/object-storage/file-system-gcs.md +++ b/docs/src/main/sphinx/object-storage/file-system-gcs.md @@ -19,9 +19,9 @@ Storage file system support: * - Property - Description * - `fs.native-gcs.enabled` - - Activate the native implementation for Google Cloud Storage support, and - deactivate all [legacy support](file-system-legacy). Defaults to `false`. - Must be set to `true` for all other properties be used. + - Activate the native implementation for Google Cloud Storage support. + Defaults to `false`. Set to `true` to use Google Cloud Storage and enable + all other properties. * - `gcs.project-id` - Identifier for the project on Google Cloud Storage. * - `gcs.client.max-retries` diff --git a/docs/src/main/sphinx/object-storage/file-system-hdfs.md b/docs/src/main/sphinx/object-storage/file-system-hdfs.md index 47867a6126dd..90bf5bf55643 100644 --- a/docs/src/main/sphinx/object-storage/file-system-hdfs.md +++ b/docs/src/main/sphinx/object-storage/file-system-hdfs.md @@ -4,8 +4,8 @@ Trino includes support to access the [Hadoop Distributed File System (HDFS)](https://hadoop.apache.org/) with a catalog using the Delta Lake, Hive, Hudi, or Iceberg connectors. -Support is enabled by default, but can be deactivated by setting -`fs.hadoop.enabled` to `false`. +Support for HDFS is not enabled by default, but can be activated by setting the +`fs.hadoop.enabled` property to `true` in your catalog configuration file. Apache Hadoop HDFS 2.x and 3.x are supported. @@ -20,8 +20,8 @@ Use the following properties to configure general aspects of HDFS support: * - Property - Description * - `fs.hadoop.enabled` - - Activate the support for HDFS access. Defaults to `true`. Must be set to - `true` for all other properties be used. + - Activate the support for HDFS access. Defaults to `false`. Set to `true` to + use HDFS and enable all other properties. * - `hive.config.resources` - An optional, comma-separated list of HDFS configuration files. These files must exist on the machines running Trino. For basic setups, Trino configures diff --git a/docs/src/main/sphinx/object-storage/file-system-s3.md b/docs/src/main/sphinx/object-storage/file-system-s3.md index 2d7eecc1d78d..b7ae9aad0a1c 100644 --- a/docs/src/main/sphinx/object-storage/file-system-s3.md +++ b/docs/src/main/sphinx/object-storage/file-system-s3.md @@ -22,9 +22,8 @@ support: * - Property - Description * - `fs.native-s3.enabled` - - Activate the native implementation for S3 storage support, and deactivate - all [legacy support](file-system-legacy). Defaults to `false`. Must be set - to `true` for all other properties be used. + - Activate the native implementation for S3 storage support. Defaults to + `false`. Set to `true` to use S3 and enable all other properties. * - `s3.endpoint` - Required endpoint URL for S3. * - `s3.region` diff --git a/docs/src/main/sphinx/object-storage/legacy-azure.md b/docs/src/main/sphinx/object-storage/legacy-azure.md index b25a76a64990..c44fd3257d60 100644 --- a/docs/src/main/sphinx/object-storage/legacy-azure.md +++ b/docs/src/main/sphinx/object-storage/legacy-azure.md @@ -4,6 +4,11 @@ The {doc}`/connector/hive` can be configured to use [Azure Data Lake Storage (Gen2)](https://azure.microsoft.com/products/storage/data-lake-storage/). Trino supports Azure Blob File System (ABFS) to access data in ADLS Gen2. +:::{warning} +Legacy support is not recommended and will be removed. Use +[](file-system-azure). +::: + ## Hive connector configuration for Azure Storage credentials To configure Trino to use the Azure Storage credentials, set the following @@ -17,6 +22,9 @@ For more complex use cases, such as configuring multiple secondary storage accounts using Hadoop's `core-site.xml`, see the {ref}`hive-azure-advanced-config` options. +To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in +your catalog configuration file. + ### ADLS Gen2 / ABFS storage To connect to ABFS storage, you may either use the storage account's access diff --git a/docs/src/main/sphinx/object-storage/legacy-cos.md b/docs/src/main/sphinx/object-storage/legacy-cos.md index be2cc501d092..51c1f75d6b4b 100644 --- a/docs/src/main/sphinx/object-storage/legacy-cos.md +++ b/docs/src/main/sphinx/object-storage/legacy-cos.md @@ -2,8 +2,15 @@ Configure the {doc}`/connector/hive` to support [IBM Cloud Object Storage COS](https://www.ibm.com/cloud/object-storage) access. +:::{warning} +Legacy support is not recommended and will be removed. Use [](file-system-s3). +::: + ## Configuration +To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in +your catalog configuration file. + To use COS, you need to configure a catalog file to use the Hive connector. For example, create a file `etc/ibmcos.properties` and specify the path to the COS service config file with the diff --git a/docs/src/main/sphinx/object-storage/legacy-gcs.md b/docs/src/main/sphinx/object-storage/legacy-gcs.md index 8e8927680e8a..de29544f9466 100644 --- a/docs/src/main/sphinx/object-storage/legacy-gcs.md +++ b/docs/src/main/sphinx/object-storage/legacy-gcs.md @@ -4,6 +4,10 @@ Object storage connectors can access [Google Cloud Storage](https://cloud.google.com/storage/) data using the `gs://` URI prefix. +:::{warning} +Legacy support is not recommended and will be removed. Use [](file-system-gcs). +::: + ## Requirements To use Google Cloud Storage with non-anonymous access objects, you need: @@ -14,6 +18,9 @@ To use Google Cloud Storage with non-anonymous access objects, you need: (hive-google-cloud-storage-configuration)= ## Configuration +To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in +your catalog configuration file. + The use of Google Cloud Storage as a storage location for an object storage catalog requires setting a configuration property that defines the [authentication method for any non-anonymous access object](https://cloud.google.com/storage/docs/authentication). Access methods cannot diff --git a/docs/src/main/sphinx/object-storage/legacy-s3.md b/docs/src/main/sphinx/object-storage/legacy-s3.md index 06f38b55231a..1624f732214c 100644 --- a/docs/src/main/sphinx/object-storage/legacy-s3.md +++ b/docs/src/main/sphinx/object-storage/legacy-s3.md @@ -8,6 +8,13 @@ uses an S3 prefix, rather than an HDFS prefix. Trino uses its own S3 filesystem for the URI prefixes `s3://`, `s3n://` and `s3a://`. +:::{warning} +Legacy support is not recommended and will be removed. Use [](file-system-s3). +::: + +To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in +your catalog configuration file. + (hive-s3-configuration)= ## S3 configuration properties