Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs changes for Hadoop FS default #23366

Merged
merged 1 commit into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 13 additions & 10 deletions docs/src/main/sphinx/connector/delta-lake.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,21 @@ To connect to Databricks Delta Lake, you need:
## General configuration

To configure the Delta Lake connector, create a catalog properties file
`etc/catalog/example.properties` that references the `delta_lake`
connector and defines a metastore. You must configure a metastore for table
metadata. If you are using a {ref}`Hive metastore <hive-thrift-metastore>`,
`hive.metastore.uri` must be configured:
`etc/catalog/example.properties` that references the `delta_lake` connector.

You must configure a [metastore for metadata](/object-storage/metastores).

You must select and configure one of the [supported file
systems](delta-lake-file-system-configuration).

```properties
connector.name=delta_lake
hive.metastore.uri=thrift://example.net:9083
fs.x.enabled=true
```

Replace the `fs.x.enabled` configuration property with the desired file system.

If you are using {ref}`AWS Glue <hive-glue-metastore>` as your metastore, you
must instead set `hive.metastore` to `glue`:

Expand All @@ -55,17 +60,15 @@ visible to the connector.
(delta-lake-file-system-configuration)=
## File system access configuration

The connector supports native, high-performance file system access to object
storage systems:
The connector supports accessing the following file systems:

* [](/object-storage)
* [](/object-storage/file-system-azure)
Joelg96 marked this conversation as resolved.
Show resolved Hide resolved
* [](/object-storage/file-system-gcs)
* [](/object-storage/file-system-s3)
* [](/object-storage/file-system-hdfs)

Joelg96 marked this conversation as resolved.
Show resolved Hide resolved
You must enable and configure the specific native file system access. If none is
activated, the [legacy support](file-system-legacy) is used and must be
configured.
You must enable and configure the specific file system access. [Legacy
support](file-system-legacy) is not recommended and will be removed.

### Delta Lake general configuration properties

Expand Down
30 changes: 17 additions & 13 deletions docs/src/main/sphinx/connector/hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ The Hive connector requires a
implementation of the Hive metastore, such as
{ref}`AWS Glue <hive-glue-metastore>`.

Many [distributed storage systems](hive-file-system-configuration) can be
queried with the Hive connector.
You must select and configure a [supported
file system](hive-file-system-configuration) in your catalog configuration file.

The coordinator and all workers must have network access to the Hive metastore
and the storage system. Hive metastore access with the Thrift protocol defaults
Expand Down Expand Up @@ -58,16 +58,22 @@ In the case of serializable formats, only specific
## General configuration

To configure the Hive connector, create a catalog properties file
`etc/catalog/example.properties` that references the `hive`
connector and defines a metastore. You must configure a metastore for table
metadata. If you are using a {ref}`Hive metastore <hive-thrift-metastore>`,
`hive.metastore.uri` must be configured:
`etc/catalog/example.properties` that references the `hive` connector.

You must configure a [metastore for metadata](/object-storage/metastores).

You must select and configure one of the [supported file
systems](hive-file-system-configuration).


```properties
connector.name=hive
hive.metastore.uri=thrift://example.net:9083
fs.x.enabled=true
```

Replace the `fs.x.enabled` configuration property with the desired file system.

If you are using {ref}`AWS Glue <hive-glue-metastore>` as your metastore, you
must instead set `hive.metastore` to `glue`:

Expand All @@ -77,7 +83,7 @@ hive.metastore=glue
```

Each metastore type has specific configuration properties along with
{ref}`general metastore configuration properties <general-metastore-properties>`.
[](general-metastore-properties).

### Multiple Hive clusters

Expand Down Expand Up @@ -290,17 +296,15 @@ Hive connector documentation.
(hive-file-system-configuration)=
### File system access configuration

The connector supports native, high-performance file system access to object
storage systems:
The connector supports accessing the following file systems:

* [](/object-storage)
* [](/object-storage/file-system-azure)
* [](/object-storage/file-system-gcs)
* [](/object-storage/file-system-s3)
* [](/object-storage/file-system-hdfs)

You must enable and configure the specific native file system access. If none is
activated, the [legacy support](file-system-legacy) is used and must be
configured.
You must enable and configure the specific file system access. [Legacy
support](file-system-legacy) is not recommended and will be removed.

(hive-fte-support)=
### Fault-tolerant execution support
Expand Down
22 changes: 13 additions & 9 deletions docs/src/main/sphinx/connector/hudi.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,21 @@ To use the Hudi connector, you need:
## General configuration

To configure the Hudi connector, create a catalog properties file
`etc/catalog/example.properties` that references the `hudi`
connector and defines the HMS to use with the `hive.metastore.uri`
configuration property:
`etc/catalog/example.properties` that references the `hudi` connector.

You must configure a [metastore for table metadata](/object-storage/metastores).

You must select and configure one of the [supported file
systems](hudi-file-system-configuration).

```properties
connector.name=hudi
hive.metastore.uri=thrift://example.net:9083
fs.x.enabled=true
```

Replace the `fs.x.enabled` configuration property with the desired file system.

There are {ref}`HMS configuration properties <general-metastore-properties>`
available for use with the Hudi connector. The connector recognizes Hudi tables
synced to the metastore by the [Hudi sync tool](https://hudi.apache.org/docs/syncing_metastore).
Expand Down Expand Up @@ -96,17 +102,15 @@ Additionally, following configuration properties can be set depending on the use
(hudi-file-system-configuration)=
## File system access configuration

The connector supports native, high-performance file system access to object
storage systems:
The connector supports accessing the following file systems:

* [](/object-storage)
* [](/object-storage/file-system-azure)
* [](/object-storage/file-system-gcs)
* [](/object-storage/file-system-s3)
* [](/object-storage/file-system-hdfs)

You must enable and configure the specific native file system access. If none is
activated, the [legacy support](file-system-legacy) is used and must be
configured.
You must enable and configure the specific file system access. [Legacy
support](file-system-legacy) is not recommended and will be removed.

## SQL support

Expand Down
24 changes: 14 additions & 10 deletions docs/src/main/sphinx/connector/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,16 +49,22 @@ To use Iceberg, you need:
## General configuration

To configure the Iceberg connector, create a catalog properties file
`etc/catalog/example.properties` that references the `iceberg`
connector and defines a metastore type. The Hive metastore catalog is the
default implementation. To use a {ref}`Hive metastore <hive-thrift-metastore>`,
`hive.metastore.uri` must be configured:
`etc/catalog/example.properties` that references the `iceberg` connector.

The [Hive metastore catalog](hive-thrift-metastore) is the default
implementation.

You must select and configure one of the [supported file
systems](iceberg-file-system-configuration).

```properties
connector.name=iceberg
hive.metastore.uri=thrift://example.net:9083
fs.x.enabled=true
```

Replace the `fs.x.enabled` configuration property with the desired file system.

Other metadata catalog types as listed in the requirements section of this topic
are available. Each metastore type has specific configuration properties along
with {ref}`general metastore configuration properties
Expand Down Expand Up @@ -201,17 +207,15 @@ processing. Read and write operations are both supported with any retry policy.
(iceberg-file-system-configuration)=
## File system access configuration

The connector supports native, high-performance file system access to object
storage systems:
The connector supports accessing the following file systems:

* [](/object-storage)
* [](/object-storage/file-system-azure)
* [](/object-storage/file-system-gcs)
* [](/object-storage/file-system-s3)
* [](/object-storage/file-system-hdfs)

You must enable and configure the specific native file system access. If none is
activated, the [legacy support](file-system-legacy) is used and must be
configured.
You must enable and configure the specific file system access. [Legacy
support](file-system-legacy) is not recommended and will be removed.

## Type mapping

Expand Down
23 changes: 11 additions & 12 deletions docs/src/main/sphinx/object-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,31 +24,30 @@ availability.
(file-system-configuration)=
## Configuration

Use the following properties to control the support for different file systems
in a catalog. Each catalog can only use one file system.
By default, no file system support is activated for your catalog. You must
select and configure one of the following properties to determine the support
for different file systems in the catalog. Each catalog can only use one file
system support.

:::{list-table} File system support properties
:widths: 35, 65
:header-rows: 1

* - Property
- Description
* - `fs.hadoop.enabled`
- Activate the [legacy libraries and implementation based on the Hadoop](file-system-legacy)
ecosystem. Defaults to `true`.
* - `fs.native-azure.enabled`
- Activate the [native implementation for Azure Storage
support](/object-storage/file-system-azure), and deactivate all [legacy
support](file-system-legacy). Defaults to `false`.
support](/object-storage/file-system-azure). Defaults to `false`.
* - `fs.native-gcs.enabled`
- Activate the [native implementation for Google Cloud Storage
support](/object-storage/file-system-gcs), and deactivate all [legacy
support](file-system-legacy). Defaults to `false`.
support](/object-storage/file-system-gcs). Defaults to `false`.
* - `fs.native-s3.enabled`
- Activate the [native implementation for S3 storage
support](/object-storage/file-system-s3), and deactivate all [legacy
support](file-system-legacy) . Defaults to `false`.

support](/object-storage/file-system-s3). Defaults to `false`.
* - `fs.hadoop.enabled`
- Activate [support for HDFS](/object-storage/file-system-hdfs) and [legacy
support for other file systems](file-system-legacy) using the HDFS
libraries. Defaults to `false`.
:::

(file-system-native)=
Expand Down
5 changes: 2 additions & 3 deletions docs/src/main/sphinx/object-storage/file-system-azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,8 @@ system support:
* - Property
- Description
* - `fs.native-azure.enabled`
- Activate the native implementation for Azure Storage support, and deactivate
all [legacy support](file-system-legacy). Defaults to `false`.
Must be set to `true` for all other properties be used.
- Activate the native implementation for Azure Storage support. Defaults to
`false`. Set to `true` to use Azure Storage and enable all other properties.
* - `azure.auth-type`
- Authentication type to use for Azure Storage access. Defaults no
authentication used with `NONE`. Use `ACCESS_KEY` for
Expand Down
6 changes: 3 additions & 3 deletions docs/src/main/sphinx/object-storage/file-system-gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ Storage file system support:
* - Property
- Description
* - `fs.native-gcs.enabled`
- Activate the native implementation for Google Cloud Storage support, and
deactivate all [legacy support](file-system-legacy). Defaults to `false`.
Must be set to `true` for all other properties be used.
- Activate the native implementation for Google Cloud Storage support.
Defaults to `false`. Set to `true` to use Google Cloud Storage and enable
all other properties.
* - `gcs.project-id`
- Identifier for the project on Google Cloud Storage.
* - `gcs.client.max-retries`
Expand Down
8 changes: 4 additions & 4 deletions docs/src/main/sphinx/object-storage/file-system-hdfs.md
Joelg96 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ Trino includes support to access the [Hadoop Distributed File System
(HDFS)](https://hadoop.apache.org/) with a catalog using the Delta Lake, Hive,
Hudi, or Iceberg connectors.

Support is enabled by default, but can be deactivated by setting
`fs.hadoop.enabled` to `false`.
Support for HDFS is not enabled by default, but can be activated by setting the
`fs.hadoop.enabled` property to `true` in your catalog configuration file.

Apache Hadoop HDFS 2.x and 3.x are supported.

Expand All @@ -20,8 +20,8 @@ Use the following properties to configure general aspects of HDFS support:
* - Property
Joelg96 marked this conversation as resolved.
Show resolved Hide resolved
- Description
* - `fs.hadoop.enabled`
- Activate the support for HDFS access. Defaults to `true`. Must be set to
`true` for all other properties be used.
- Activate the support for HDFS access. Defaults to `false`. Set to `true` to
use HDFS and enable all other properties.
* - `hive.config.resources`
- An optional, comma-separated list of HDFS configuration files. These files
must exist on the machines running Trino. For basic setups, Trino configures
Expand Down
5 changes: 2 additions & 3 deletions docs/src/main/sphinx/object-storage/file-system-s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,8 @@ support:
* - Property
- Description
* - `fs.native-s3.enabled`
- Activate the native implementation for S3 storage support, and deactivate
all [legacy support](file-system-legacy). Defaults to `false`. Must be set
to `true` for all other properties be used.
- Activate the native implementation for S3 storage support. Defaults to
`false`. Set to `true` to use S3 and enable all other properties.
* - `s3.endpoint`
- Required endpoint URL for S3.
* - `s3.region`
Expand Down
8 changes: 8 additions & 0 deletions docs/src/main/sphinx/object-storage/legacy-azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@ The {doc}`/connector/hive` can be configured to use [Azure Data Lake Storage
(Gen2)](https://azure.microsoft.com/products/storage/data-lake-storage/). Trino
supports Azure Blob File System (ABFS) to access data in ADLS Gen2.

:::{warning}
Legacy support is not recommended and will be removed. Use
[](file-system-azure).
:::

## Hive connector configuration for Azure Storage credentials

To configure Trino to use the Azure Storage credentials, set the following
Expand All @@ -17,6 +22,9 @@ For more complex use cases, such as configuring multiple secondary storage
accounts using Hadoop's `core-site.xml`, see the
{ref}`hive-azure-advanced-config` options.

To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in
your catalog configuration file.

### ADLS Gen2 / ABFS storage

To connect to ABFS storage, you may either use the storage account's access
Expand Down
7 changes: 7 additions & 0 deletions docs/src/main/sphinx/object-storage/legacy-cos.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,15 @@

Configure the {doc}`/connector/hive` to support [IBM Cloud Object Storage COS](https://www.ibm.com/cloud/object-storage) access.

:::{warning}
Legacy support is not recommended and will be removed. Use [](file-system-s3).
Joelg96 marked this conversation as resolved.
Show resolved Hide resolved
:::

## Configuration

To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in
your catalog configuration file.

To use COS, you need to configure a catalog file to use the Hive
connector. For example, create a file `etc/ibmcos.properties` and
specify the path to the COS service config file with the
Expand Down
7 changes: 7 additions & 0 deletions docs/src/main/sphinx/object-storage/legacy-gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ Object storage connectors can access
[Google Cloud Storage](https://cloud.google.com/storage/) data using the
`gs://` URI prefix.

:::{warning}
Legacy support is not recommended and will be removed. Use [](file-system-gcs).
:::

## Requirements

To use Google Cloud Storage with non-anonymous access objects, you need:
Expand All @@ -14,6 +18,9 @@ To use Google Cloud Storage with non-anonymous access objects, you need:
(hive-google-cloud-storage-configuration)=
## Configuration

To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in
your catalog configuration file.

The use of Google Cloud Storage as a storage location for an object storage
catalog requires setting a configuration property that defines the
[authentication method for any non-anonymous access object](https://cloud.google.com/storage/docs/authentication). Access methods cannot
Expand Down
7 changes: 7 additions & 0 deletions docs/src/main/sphinx/object-storage/legacy-s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@ uses an S3 prefix, rather than an HDFS prefix.
Trino uses its own S3 filesystem for the URI prefixes
`s3://`, `s3n://` and `s3a://`.

:::{warning}
Legacy support is not recommended and will be removed. Use [](file-system-s3).
:::

To use legacy support, the `fs.hadoop.enabled` property must be set to `true` in
your catalog configuration file.

(hive-s3-configuration)=
## S3 configuration properties

Expand Down
Loading