Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] data prepper secret extensions #5202

100 changes: 100 additions & 0 deletions _data-prepper/managing-data-prepper/configuring-data-prepper.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
sinkShutdownTimeout | No | Duration | The time given to sinks to clear any in-flight data and gracefully shut down. Default is 30s.
peer_forwarder | No | Object | Peer forwarder configurations. See [Peer forwarder options](#peer-forwarder-options) for more details.
circuit_breakers | No | [circuit_breakers](#circuit-breakers) | Configures a circuit breaker on incoming data.
extensions | No | Object | The pipeline extension plugin configurations. See [Extension plugins](#extension-plugins) for more details.

### Peer forwarder options

Expand Down Expand Up @@ -100,3 +101,102 @@
reset | No | Duration | After tripping the circuit breaker, no new checks are made until after this time has passed. This effectively sets the minimum time for a breaker to remain open to allow for clearing memory. Defaults to `1s`.
check_interval | No | Duration | Specifies the time between checks of the heap size. Defaults to `500ms`.

### Extension plugins

Since Data Prepper 2.5, Data Prepper provides support for user configurable extension plugins. Extension plugins are shared common
configurations shared across pipeline plugins, i.e. [source/buffer/processor/sink]({{site.url}}{{site.baseurl}}/data-prepper/index/#concepts).

#### AWS extension plugins

Check failure on line 109 in _data-prepper/managing-data-prepper/configuring-data-prepper.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'AWS extension plugins' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'AWS extension plugins' is a heading and should be in sentence case.", "location": {"path": "_data-prepper/managing-data-prepper/configuring-data-prepper.md", "range": {"start": {"line": 109, "column": 6}}}, "severity": "ERROR"}

Collection of AWS resource related extension plugins. All such plugin configuration objects are under `aws:`

| Option | Required | Type | Description |
|:-------|:---------|:-------|:-----------------------------------------|
| aws | No | Object | The AWS extension plugins configuration. |

##### AWS secrets extension plugin

Check failure on line 117 in _data-prepper/managing-data-prepper/configuring-data-prepper.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'AWS secrets extension plugin' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'AWS secrets extension plugin' is a heading and should be in sentence case.", "location": {"path": "_data-prepper/managing-data-prepper/configuring-data-prepper.md", "range": {"start": {"line": 117, "column": 7}}}, "severity": "ERROR"}

Configures [AWS Secrets Manager](https://docs.aws.amazon.com/secretsmanager/latest/userguide/intro.html) extension plugin to be
referenced in pipeline plugin configurations, e.g.

```
extensions:
aws:
secrets:
<YOUR_SECRET_CONFIG_ID_1>:
secret_id: <YOUR_SECRET_ID_1>
region: <YOUR_REGION_1>
sts_role_arn: <YOUR_STS_ROLE_ARN_1>
refresh_interval: <YOUR_REFRESH_INTERVAL>
<YOUR_SECRET_CONFIG_ID_2>:
...
```

| Option | Required | Type | Description |
|:--------|:---------|:-------|:---------------------------------------------------------------------------------------------|
| secrets | No | Object | The AWS Secrets Manager extension plugin configuration. See [Secrets](#secrets) for details. |

###### Secrets

Multiple secrets configuration objects can be defined with unique id for each.

| Option | Required | Type | Description |
|:-----------------|:---------|:---------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| secret_id | Yes | String | The AWS secret name or ARN. |
| region | No | String | The AWS region of the secret. Defaults to `us-east-1`. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chenqi0805 , Does this default to us-east-1? I'd think it uses the AWS SDK's default. This can be the AWS_REGION environment variable or us-east-1. Do we know for sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We do default to us-east-1 the same way as the S3 DLQ in opensearch sink

| sts_role_arn | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to AWS Secrets Manager. Defaults to `null`, which will use the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html). |
| refresh_interval | No | Duration | The refreshment interval for AWS secrets extension plugin to poll new secret values. See [Secrets refreshment](#automatically-refreshing-secrets) for details. Defaults to `PT1H`. |

###### Reference secrets

In `pipelines.yaml`, secret values can be referenced within pipeline plugins using the following formats:

* plaintext: `${{aws_secrets:<YOUR_SECRET_CONFIG_ID>}}`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be clearly state in the preceding paragraph that users need to replace <YOUR_SECRET_CONFIG_ID> including the <>.

* JSON (key-value pairs): `${{aws_secrets:<YOUR_SECRET_CONFIG_ID>:<YOUR_KEY>}}`

Note that `<YOUR_SECRET_CONFIG_ID>` should be replaced with the corresponding secret config ID under `/extensions/aws/secrets` and `<YOUR_KEY>` should be replaced with the desired key in the secret JSON value. The secret value reference string format can be interpreted for the following plugin setting data types:

* String
* Number
* Long
* Short
* Integer
* Double
* Float
* Boolean
* Character

The following snippet in `pipelines.yaml` uses an OpenSearch sink as an example

```
sink:
- opensearch:
hosts: [ "${{aws_secrets:host-secret-config}}" ]
username: "${{aws_secrets:credential-secret-config:username}}"
password: "${{aws_secrets:credential-secret-config:password}}"
index: "test-migration"
```

Note that the above snippet reference assumes plaintext secret value in `host-secret-config` and json key-value pairs in `credential-secret-config` with both `username` and `password` present as keys.

Check warning on line 180 in _data-prepper/managing-data-prepper/configuring-data-prepper.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_data-prepper/managing-data-prepper/configuring-data-prepper.md", "range": {"start": {"line": 180, "column": 15}}}, "severity": "WARNING"}

Check failure on line 180 in _data-prepper/managing-data-prepper/configuring-data-prepper.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: plaintext. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: plaintext. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/managing-data-prepper/configuring-data-prepper.md", "range": {"start": {"line": 180, "column": 47}}}, "severity": "ERROR"}

Check failure on line 180 in _data-prepper/managing-data-prepper/configuring-data-prepper.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'JSON' instead of 'json'. Raw Output: {"message": "[Vale.Terms] Use 'JSON' instead of 'json'.", "location": {"path": "_data-prepper/managing-data-prepper/configuring-data-prepper.md", "range": {"start": {"line": 180, "column": 98}}}, "severity": "ERROR"}

Check failure on line 180 in _data-prepper/managing-data-prepper/configuring-data-prepper.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: json. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: json. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/managing-data-prepper/configuring-data-prepper.md", "range": {"start": {"line": 180, "column": 98}}}, "severity": "ERROR"}
The corresponding AWS secrets extension plugin in `data-prepper-config.yaml` is as follows:

```
extensions:
aws:
secrets:
host-secret-config:
secret_id: <YOUR_SECRET_ID_1>
region: <YOUR_REGION_1>
sts_role_arn: <YOUR_STS_ROLE_ARN_1>
refresh_interval: <YOUR_REFRESH_INTERVAL_1>
credential-secret-config:
secret_id: <YOUR_SECRET_ID_2>
region: <YOUR_REGION_2>
sts_role_arn: <YOUR_STS_ROLE_ARN_2>
refresh_interval: <YOUR_REFRESH_INTERVAL_2>
```

###### Automatically refreshing secrets

For each individual secret configuration, the latest secret value is polled on a regular interval to support refreshing secrets in AWS Secrets Manager. The refreshed secret values are utilized by certain pipeline plugins to refresh their components, e.g. connection and authentication to the backend service.
For multiple secret configurations, jitter within 60s will be applied across them on the initial secrets polling.

Check warning on line 202 in _data-prepper/managing-data-prepper/configuring-data-prepper.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.UnitsSpacing] Put a space between the number and the units in '60s '. Raw Output: {"message": "[OpenSearch.UnitsSpacing] Put a space between the number and the units in '60s '.", "location": {"path": "_data-prepper/managing-data-prepper/configuring-data-prepper.md", "range": {"start": {"line": 202, "column": 51}}}, "severity": "WARNING"}
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ When you run the `rename_keys` processor, it parses the message into the followi

## Special considerations

Renaming operations occur in the order that the key-value pair entries are listed in the `pipeline.yaml` file. This means that chaining (where key-value pairs are renamed in sequence) is implicit in the `rename_keys` processor. See the following example `pipline.yaml` file:
Renaming operations occur in the order that the key-value pair entries are listed in the `pipeline.yaml` file. This means that chaining (where key-value pairs are renamed in sequence) is implicit in the `rename_keys` processor. See the following example `pipeline.yaml` file:

```yaml
pipeline:
Expand Down
4 changes: 2 additions & 2 deletions _data-prepper/pipelines/configuration/sources/opensearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,8 @@ The following table describes options you can configure for the `opensearch` sou
Option | Required | Type | Description
:--- | :--- |:--------| :---
`hosts` | Yes | List | A list of OpenSearch hosts to write to, for example, `["https://localhost:9200", "https://remote-cluster:9200"]`.
`username` | No | String | The username for HTTP basic authentication.
`password` | No | String | The password for HTTP basic authentication.
`username` | No | String | The username for HTTP basic authentication. Since Data Prepper 2.5, this setting can be refreshed at runtime if [AWS secrets reference]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-data-prepper/#reference-secrets) is applied.
`password` | No | String | The password for HTTP basic authentication. Since Data Prepper 2.5, this setting can be refreshed at runtime if [AWS secrets reference]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-data-prepper/#reference-secrets) is applied.
`disable_authentication` | No | Boolean | Whether authentication is disabled. Defaults to `false`.
`aws` | No | Object | The AWS configuration. For more information, see [aws](#aws).
`acknowledgments` | No | Boolean | When `true`, enables the `opensearch` source to receive [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/#end-to-end-acknowledgments) when events are received by OpenSearch sinks. Default is `false`.
Expand Down
32 changes: 32 additions & 0 deletions _data-prepper/pipelines/pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -326,3 +326,35 @@
```


# Pipeline Configurations

Check failure on line 329 in _data-prepper/pipelines/pipelines.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'Pipeline Configurations' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'Pipeline Configurations' is a heading and should be in sentence case.", "location": {"path": "_data-prepper/pipelines/pipelines.md", "range": {"start": {"line": 329, "column": 3}}}, "severity": "ERROR"}

Since Data Prepper 2.5, shared pipeline components can be configured under the reserved section `pipeline_configurations` when all pipelines are defined in a single pipeline configuration YAML file.
Shared pipeline configurations can include certain components within [Extension Plugins]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-data-prepper/#extension-plugins), e.g.

```
pipeline_configurations:
aws:
secrets:
credential-secret-config:
secret_id: <YOUR_SECRET_ID>
region: <YOUR_REGION>
sts_role_arn: <YOUR_STS_ROLE_ARN>
simple-sample-pipeline:
...
sink:
- opensearch:
hosts: [ "${{aws_secrets:host-secret-config}}" ]
username: "${{aws_secrets:credential-secret-config:username}}"
password: "${{aws_secrets:credential-secret-config:password}}"
index: "test-migration"
```

The above snippet in `pipelines.yaml` defines [AWS secrets extension plugin]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-data-prepper/#aws-extension-plugins) within `pipeline_configurations` and refers to secrets configurations in opensearch sink.

Check warning on line 352 in _data-prepper/pipelines/pipelines.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_data-prepper/pipelines/pipelines.md", "range": {"start": {"line": 352, "column": 5}}}, "severity": "WARNING"}
Note that when the same component is defined in both `pipelines.yaml` and `data-prepper-config.yaml`, the definition in the `pipelines.yaml` will overwrite the counterpart in `data-prepper-config.yaml`. In the above example, `pipeline_configurations/aws/secrets`

Check warning on line 353 in _data-prepper/pipelines/pipelines.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions. Raw Output: {"message": "[OpenSearch.DirectionAboveBelow] Use 'previous, preceding, or earlier' instead of 'above' for versions or orientation within a document. Use 'above' and 'below' only for physical space or screen descriptions.", "location": {"path": "_data-prepper/pipelines/pipelines.md", "range": {"start": {"line": 353, "column": 211}}}, "severity": "WARNING"}
in `pipelines.yaml` will overwrite `extensions/aws/secrets` if defined in `data-prepper-config.yaml`.

The collection of shared pipeline components are as follows.

## AWS secrets extension

Check failure on line 358 in _data-prepper/pipelines/pipelines.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'AWS secrets extension' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'AWS secrets extension' is a heading and should be in sentence case.", "location": {"path": "_data-prepper/pipelines/pipelines.md", "range": {"start": {"line": 358, "column": 4}}}, "severity": "ERROR"}

See [AWS secrets extension plugin]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-data-prepper/#aws-secrets-extension-plugin) for details.
Loading