Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Lambda plugin documentation #8655

Merged
merged 42 commits into from
Nov 7, 2024
Merged
Changes from 3 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
9e40253
Add aws lambda processor and sink documentation
srikanthjg Nov 4, 2024
2083bcb
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 5, 2024
b54a2fe
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 5, 2024
0657bcf
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 5, 2024
6e9cede
Update aws-lambda.md
vagimeli Nov 5, 2024
878ce94
Merge branch 'main' into aws-lambda
vagimeli Nov 5, 2024
b25c760
Merge branch 'main' into aws-lambda
vagimeli Nov 5, 2024
37470a7
Merge branch 'main' into aws-lambda
vagimeli Nov 6, 2024
7c59d91
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
c29d313
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
26d6e31
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
37005a7
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
49cb588
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
4049c29
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
019d805
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
1b3a8c0
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
0745100
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
e7922b8
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
25c1e7f
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
1f29d3d
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
7f7b12a
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
d1f33a2
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
bd08065
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
d3e49ca
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
a94ab92
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
7e8e978
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
43abb77
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
7ee5d2c
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
c77d1a1
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
a03ebee
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
c56f538
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
8d17385
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
69e8efb
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
06310f5
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
cfacc7b
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
712741d
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
45af8e2
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
dab3270
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
bb84572
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
0bbb288
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
d3b8200
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
e9939b7
Update _data-prepper/pipelines/configuration/processors/aws-lambda.md
vagimeli Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions _data-prepper/pipelines/configuration/processors/aws-lambda.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
---
layout: default
title: AWS Lambda integration for Data Prepper
parent: Processors
grand_parent: Pipelines
nav_order: 10
---

# AWS Lambda integration for Data Prepper

This document provides the configuration details and usage instructions for integrating AWS Lambda with Data Prepper, both as a processor and as a sink.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

----------------------------------------------------------------------------------------
## AWS Lambda Processor

Check failure on line 14 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'AWS Lambda Processor' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'AWS Lambda Processor' is a heading and should be in sentence case.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 14, "column": 4}}}, "severity": "ERROR"}
Configuration
The aws_lambda processor allows you to invoke an AWS Lambda function in your Data Prepper pipeline to process events. This can be used for synchronous or asynchronous invocations based on your requirements.

Configuration Fields:

```
Field | Type | Required | Description
-------------------- | ------- | -------- | ----------------------------------------------------------------------------
function_name | String | Yes | The name of the AWS Lambda function to invoke.
invocation_type | String | Yes | Specifies the invocation type: either request-response or event. Default is request-response
aws.region | String | Yes | The AWS region where the Lambda function is located.
aws.sts_role_arn | String | No | ARN of the role to assume before invoking the Lambda function.
max_retries | Integer | No | Maximum number of retries if the invocation fails. Default is 3.
batch | Object | No | Batch settings for the Lambda invocations. Default key_name = "events". Default Threshold for event_count=100, maximum_size="5mb", event_collect_timeout = 10s
lambda_when | String | No | Conditional expression to determine when to invoke the Lambda processor.
response_codec | Object | No | Codec configuration for parsing Lambda responses. Default is json
tags_on_match_failure| List | No | A List of Strings that specifies the tags to be set in the event when lambda fails to match or an unknown exception occurs while matching.

Check failure on line 31 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Lambda' instead of 'lambda'. Raw Output: {"message": "[Vale.Terms] Use 'Lambda' instead of 'lambda'.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 31, "column": 115}}}, "severity": "ERROR"}
sdk_timeout | Duration| No | Defines the time, sdk maintains the connection to the client before timing out. Default is 60s
response_events_match| boolean | No | Defines the way Data Prepper treats the response from Lambda. Default is false
```

Example Configuration:
```
processors:
- aws_lambda:
function_name: "my-lambda-function"
invocation_type: "request-response"
response_events_match: false
aws:
region: "us-east-1"
sts_role_arn: "arn:aws:iam::123456789012:role/my-lambda-role"
max_retries: 3
batch:
key_name: "events"
threshold:
event_count: 100
maximum_size: "5mb"
event_collect_timeout: PT10S
lambda_when: "event['status'] == 'process'"

```

## Usage
Invocation Type:
- request-response: Waits for the Lambda function's response before continuing.
- event: Invokes the function asynchronously without waiting for a response.
Batching: If batching is enabled by default, events are grouped together and sent in bulk to reduce Lambda invocations. The threshold within batch defines the number of events, size limit, or timeout for batching.
Codec: Currently both request and response codecs are json. Processor response requires lambda to send back a `Json Array` only.

Check failure on line 62 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'JSON' instead of 'json'. Raw Output: {"message": "[Vale.Terms] Use 'JSON' instead of 'json'.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 62, "column": 57}}}, "severity": "ERROR"}

Check failure on line 62 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: json. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: json. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 62, "column": 57}}}, "severity": "ERROR"}
tags_on_match_failure: A List of Strings that specifies the tags to be set in the event when lambda fails to match or an unknown exception occurs while matching. This tag may be used in conditional expressions in other parts of the configuration

Check failure on line 63 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Lambda' instead of 'lambda'. Raw Output: {"message": "[Vale.Terms] Use 'Lambda' instead of 'lambda'.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 63, "column": 96}}}, "severity": "ERROR"}

## Behaviour
When the AWS Lambda processor in Data Prepper is configured for batching, it groups multiple events together into a single request based on the batch thresholds (event count, size, or time). The entire batch is sent to the Lambda function as a single payload.

Lambda Response Handling:
response_events_match configuration defines how the relationship of each events in a batch as a part of request to lambda and the response from lambda.

Check failure on line 69 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'Lambda' instead of 'lambda'. Raw Output: {"message": "[Vale.Terms] Use 'Lambda' instead of 'lambda'.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 69, "column": 116}}}, "severity": "ERROR"}
- True: Lambda typically returns a JSON array containing the results for each event in the batch. Data Prepper will map this array back to the individual events, ensuring that each event in the batch gets the corresponding part of the response from the array.
- False: Lambda could return one or multiple events back in the response for all events in a batch. but they will not be corelated back to the original events.

Check failure on line 71 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: corelated. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: corelated. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 71, "column": 122}}}, "severity": "ERROR"}
Here correlation means that that the original events metadata etc will be carry forwarded to the response events.

Check failure on line 72 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Repetition] 'that' is repeated. Raw Output: {"message": "[OpenSearch.Repetition] 'that' is repeated.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 72, "column": 26}}}, "severity": "ERROR"}

Check warning on line 72 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.LatinismsElimination] Using 'etc' is unnecessary. Remove. Raw Output: {"message": "[OpenSearch.LatinismsElimination] Using 'etc' is unnecessary. Remove.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 72, "column": 65}}}, "severity": "WARNING"}
If response_events_match is set to true, the expectation are:
1) User should return same number of response events as requests
2) Order should be maintained


## Limitations
- payload limitation: 6mb payload limit

Check warning on line 79 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.UnitsSpacing] Put a space between the number and the units in '6mb '. Raw Output: {"message": "[OpenSearch.UnitsSpacing] Put a space between the number and the units in '6mb '.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 79, "column": 23}}}, "severity": "WARNING"}
- response codec - supports only json codec

Check failure on line 80 in _data-prepper/pipelines/configuration/processors/aws-lambda.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'JSON' instead of 'json'. Raw Output: {"message": "[Vale.Terms] Use 'JSON' instead of 'json'.", "location": {"path": "_data-prepper/pipelines/configuration/processors/aws-lambda.md", "range": {"start": {"line": 80, "column": 34}}}, "severity": "ERROR"}


## Developer Guide

The integration tests for this plugin do not run as part of the Data Prepper build.
The following command runs the integration tests:

```
./gradlew :data-prepper-plugins:aws-lambda:integrationTest -Dtests.processor.lambda.region="us-east-1" -Dtests.processor.lambda.functionName="lambda_test_function" -Dtests.processor.lambda.sts_role_arn="arn:aws:iam::123456789012:role/dataprepper-role

```

----------------------------------------------------------------------------------------

## AWS Lambda Sink

```
Field | Type | Required | Description
----------------- | ------- | -------- | ----------------------------------------------------------------------------
function_name | String | Yes | The name of the AWS Lambda function to invoke.
invocation_type | String | No | Specifies the invocation type. Default is event.
aws.region | String | Yes | The AWS region where the Lambda function is located.
aws.sts_role_arn | String | No | ARN of the role to assume before invoking the Lambda function.
max_retries | Integer | No | Maximum number of retries if the invocation fails. Default is 3.
batch | Object | No | Optional batch settings for Lambda invocations. Default key_name = "events". Default Threshold for event_count=100, maximum_size="5mb", event_collect_timeout = 10s
lambda_when | String | No | Conditional expression to determine when to invoke the Lambda sink.
dlq | Object | No | Dead-letter queue (DLQ) configuration for failed invocations.
```

Example Configuration:
```
sink:
- aws_lambda:
function_name: "my-lambda-sink"
invocation_type: "event"
aws:
region: "us-west-2"
sts_role_arn: "arn:aws:iam::123456789012:role/my-lambda-sink-role"
max_retries: 5
batch:
key_name: "events"
threshold:
event_count: 50
maximum_size: "3mb"
event_collect_timeout: PT5S
lambda_when: "event['type'] == 'log'"
dlq:
region: "us-east-1"
sts_role_arn: "arn:aws:iam::123456789012:role/my-sqs-role"
bucket: "<<your-dlq-bucket-name>>"
```

Usage
Invocation Type:
- event: Invokes the function asynchronously without waiting for a response.
- request-response: Not supported in sink
Batching: Batching is enabled by default, events are grouped together based on the defined threshold in the batch configuration.
Dead-Letter Queue (DLQ): A DLQ can be configured to handle failures in Lambda invocations. If the invocation fails after retries, the failed events will be sent to the specified DLQ


## Additional Notes
IAM Role Assumption: Both the processor and sink can assume a specified IAM role (aws.sts_role_arn) before invoking Lambda functions. This allows for more secure handling of AWS resources.
Concurrency Considerations: When using the event invocation type, be mindful of Lambda concurrency limits to avoid throttling.
For further details on AWS Lambda integration with Data Prepper, refer to the AWS Lambda documentation: https://docs.aws.amazon.com/lambda

## Developer Guide

The integration tests for this plugin do not run as part of the Data Prepper build.
The following command runs the integration tests:

```
./gradlew :data-prepper-plugins:aws-lambda:integrationTest -Dtests.sink.lambda.region="us-east-1" -Dtests.sink.lambda.functionName="lambda_test_function" -Dtests.sink.lambda.sts_role_arn="arn:aws:iam::123456789012:role/dataprepper-role

```
Loading