Skip to content

Commit

Permalink
Add S3 sink documentation (#4340)
Browse files Browse the repository at this point in the history
* Add S3 sink documentation

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _data-prepper/pipelines/configuration/sinks/s3.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _data-prepper/pipelines/configuration/sinks/s3.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Update _data-prepper/pipelines/configuration/sinks/s3.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
3 people authored Jul 18, 2023
1 parent ca40c61 commit 0fcfc48
Show file tree
Hide file tree
Showing 5 changed files with 103 additions and 30 deletions.
22 changes: 14 additions & 8 deletions _data-prepper/pipelines/configuration/sinks/file.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,31 @@
---
layout: default
title: file sink
title: file
parent: Sinks
grand_parent: Pipelines
nav_order: 45
---

# file sink
# file

## Overview
Use the `file` sink to create a flat file output, usually a `.log` file.

You can use the `file` sink to create a flat file output. The following table describes options you can configure for the `file` sink.
## Configuration options

The following table describes options you can configure for the `file` sink.

Option | Required | Type | Description
:--- | :--- | :--- | :---
path | Yes | String | Path for the output file (e.g. `logs/my-transformed-log.log`).

<!--- ## Configuration
## Usage

Content will be added to this section.
The following example shows basic usage of the `file` sink:

## Metrics
```
sample-pipeline:
sink:
- file:
path: path/to/output-file
```

Content will be added to this section. --->
6 changes: 3 additions & 3 deletions _data-prepper/pipelines/configuration/sinks/opensearch.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
layout: default
title: OpenSearch sink
title: opensearch
parent: Sinks
grand_parent: Pipelines
nav_order: 45
nav_order: 50
---

# OpenSearch sink
# opensearch

You can use the `opensearch` sink plugin to send data to an OpenSearch cluster, a legacy Elasticsearch cluster, or an Amazon OpenSearch Service domain.

Expand Down
25 changes: 15 additions & 10 deletions _data-prepper/pipelines/configuration/sinks/pipeline.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,30 @@
---
layout: default
title: Pipeline sink
title: pipeline
parent: Sinks
grand_parent: Pipelines
nav_order: 45
nav_order: 55
---

# Pipeline sink
# pipeline

## Overview
Use the `pipeline` sink to write to another pipeline.

You can use the `pipeline` sink to write to another pipeline.
## Configuration options

The `pipeline` sink supports the following configuration options.

Option | Required | Type | Description
:--- | :--- | :--- | :---
name | Yes | String | Name of the pipeline to write to.

<!--- ## Configuration
Content will be added to this section.
## Usage

## Metrics
The following example configures a `pipeline` sink that writes to a pipeline named `movies`:

Content will be added to this section. --->
```
sample-pipeline:
sink:
- pipeline:
name: movies
```
70 changes: 70 additions & 0 deletions _data-prepper/pipelines/configuration/sinks/s3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
layout: default
title: s3
parent: Sinks
grand_parent: Pipelines
nav_order: 55
---

# s3

The `s3` sink sends records to an Amazon Simple Storage Service (Amazon S3) bucket using the S3 client.

## Usage

The following example creates a pipeline configured with an s3 sink. It contains additional options for customizing the event and size thresholds for which the pipeline sends record events and sets the codec type `ndjson`:

```
pipeline:
...
sink:
- s3:
aws:
region: us-east-1
sts_role_arn: arn:aws:iam::123456789012:role/Data-Prepper
sts_header_overrides:
max_retries: 5
bucket:
name: bucket_name
object_key:
path_prefix: my-elb/%{yyyy}/%{MM}/%{dd}/
threshold:
event_count: 2000
maximum_size: 50mb
event_collect_timeout: 15s
codec:
ndjson:
buffer_type: in_memory
```

## Configuration

Use the following options when customizing the `s3` sink.

Option | Required | Type | Description
:--- | :--- | :--- | :---
`bucket` | Yes | String | The object from which the data is retrieved and then stored. The `name` must match the name of your object store.
`region` | No | String | The AWS Region to use when connecting to S3. Defaults to the [standard SDK behavior to determine the Region](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/region-selection.html).
`sts_role_arn` | No | String | The [AWS Security Token Service](https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html) (AWS STS) role that the `s3` sink assumes when sending a request to S3. Defaults to the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
`sts_external_id` | No | String | The external ID to attach to AssumeRole requests from AWS STS.
`max_retries` | No | Integer | The maximum number of times a single request should retry when ingesting data to S3. Defaults to `5`.
`object_key` | No | Sets the `path_prefix` and the `file_pattern` of the object store. Defaults to the S3 object `events-%{yyyy-MM-dd'T'hh-mm-ss}` found inside the root directory of the bucket.

## Threshold configuration options

Use the following options to set ingestion thresholds for the `s3` sink.

Option | Required | Type | Description
:--- | :--- | :--- | :---
`event_count` | Yes | Integer | The maximum number of events the S3 bucket can ingest.
`maximum_size` | Yes | String | The maximum count or number of bytes that the S3 bucket can ingest. Defaults to `50mb`.
`event_collect_timeout` | Yes | String | Sets the time period during which events are collected before ingestion. All values are strings that represent duration, either an ISO_8601 notation string, such as `PT20.345S`, or a simple notation, such as `60s` or `1500ms`.

## buffer_type

`buffer_type` is an optional configuration that records stored events temporarily before flushing them into an S3 bucket. Use of one of the following options:

- `local_file`: Flushes the record into a file on your machine.
- `in_memory`: Stores the record in memory.


10 changes: 1 addition & 9 deletions _data-prepper/pipelines/configuration/sinks/stdout.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,4 @@ nav_order: 45

# stdout sink

## Overview

You can use the `stdout` sink for console output and testing. It has no configurable options.

<!---
## Metrics
Content will be added to this section. --->
Use the `stdout` sink for console output and testing. It has no configurable options.

0 comments on commit 0fcfc48

Please sign in to comment.