Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add S3 sink documentation #4340

Merged
merged 6 commits into from
Jul 18, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions _data-prepper/pipelines/configuration/sinks/file.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,31 @@
---
layout: default
title: file sink
title: file
parent: Sinks
grand_parent: Pipelines
nav_order: 45
---

# file sink
# file

## Overview
Use the `file` sink to create a flat file output, usually a `.log` file .
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

You can use the `file` sink to create a flat file output. The following table describes options you can configure for the `file` sink.
## Configuration options

The following table describes options you can configure for the `file` sink.

Option | Required | Type | Description
:--- | :--- | :--- | :---
path | Yes | String | Path for the output file (e.g. `logs/my-transformed-log.log`).

<!--- ## Configuration
## Usage

Content will be added to this section.
The following example shows the basic usage of the `file` sink:
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Metrics
```
sample-pipeline:
sink:
- file:
path: path/to/output-file
```

Content will be added to this section. --->
6 changes: 3 additions & 3 deletions _data-prepper/pipelines/configuration/sinks/opensearch.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
layout: default
title: OpenSearch sink
title: opensearch
parent: Sinks
grand_parent: Pipelines
nav_order: 45
nav_order: 50
---

# OpenSearch sink
# opensearch

You can use the `opensearch` sink plugin to send data to an OpenSearch cluster, a legacy Elasticsearch cluster, or an Amazon OpenSearch Service domain.

Expand Down
25 changes: 15 additions & 10 deletions _data-prepper/pipelines/configuration/sinks/pipeline.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,30 @@
---
layout: default
title: Pipeline sink
title: pipeline
parent: Sinks
grand_parent: Pipelines
nav_order: 45
nav_order: 55
---

# Pipeline sink
# pipeline

## Overview
Use the `pipeline` sink to write to another pipeline.

You can use the `pipeline` sink to write to another pipeline.
## Configuration options

The `pipeline` sink supports the following configuration options.

Option | Required | Type | Description
:--- | :--- | :--- | :---
name | Yes | String | Name of the pipeline to write to.

<!--- ## Configuration

Content will be added to this section.
## Usage

## Metrics
The following example configures a `pipeline` sink that writes to a pipeline named `movies`
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Content will be added to this section. --->
```
sample-pipeline:
sink:
- pipeline:
name: movies
```
70 changes: 70 additions & 0 deletions _data-prepper/pipelines/configuration/sinks/s3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
layout: default
title: s3
parent: Sinks
grand_parent: Pipelines
nav_order: 55
---

# s3

The `s3` sink sends records to an Amazon S3 bucket using the S3 client.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Usage

The following example creates a pipeline configured with an s3 sink. It contains additional options to customize the event and size thresholds for which the pipeline sends record events, and sets the codec type `ndjson`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

```
pipeline:
...
sink:
- s3:
aws:
region: us-east-1
sts_role_arn: arn:aws:iam::123456789012:role/Data-Prepper
sts_header_overrides:
max_retries: 5
bucket:
name: bucket_name
object_key:
path_prefix: my-elb/%{yyyy}/%{MM}/%{dd}/
threshold:
event_count: 2000
maximum_size: 50mb
event_collect_timeout: 15s
codec:
ndjson:
buffer_type: in_memory
```

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
## Configuration

Use the following options when customizing the section of the `s3` sink.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Option | Required | Type | Description
:--- | :--- | :--- | :---
`bucket` | Yes | String | The object from which the data is retrieved and then stored. The `name` must match the name of your object store.
`region` | No | String | The AWS region to use when connecting to S3. Defaults to the [standard SDK behavior to determine the region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`sts_role_arn` | No | String | The [AWS Security Token Service](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html) (AWS STS) role that the S3 sink assumes when sending a request to S3. Defaults to the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`sts_external_id` | No | String | The external ID to attach to AssumeRole requests from AWS STS.
`max_retries` | No | Integer | The maximum number of times a single request should retry when ingesting data to S3. Defaults to `5`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`object_key` | No | Sets the `path_prefix` and the `file_pattern` of the object store. Defaults to the S3 object `events-%{yyyy-MM-dd'T'hh-mm-ss}` found inside the root directory of the bucket.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Threshold configuration options

Use the following options to set ingestion thresholds for the `s3` sink.

Option | Required | Type | Description
:--- | :--- | :--- | :---
`event_count` | Yes | Integer | The maximum number of events the S3 bucket can ingest.
`maximum_size` | Yes | String | The maximum count or size of bytes that the S3 bucket can ingest. Defaults to `50mb`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
`event_collect_timeout` | Yes | String | Sets how long events are collected before ingestion. All values are strings that represent duration, either an ISO_8601 notation string such as `PT20.345S` or a simple notation such as `60s` or `1500ms`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## buffer_type
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

`buffer_type` is an optional configuration that records stored events temporarily before flushing them into an S3 bucket. Use of one of the following options:

- `local_file`: Flushes the record into a file on your machine.
- `in_memory`: Stores the record in memory.


10 changes: 1 addition & 9 deletions _data-prepper/pipelines/configuration/sinks/stdout.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,4 @@ nav_order: 45

# stdout sink

## Overview

You can use the `stdout` sink for console output and testing. It has no configurable options.

<!---

## Metrics

Content will be added to this section. --->
Use the `stdout` sink for console output and testing. It has no configurable options.