Batch Processing - Why does processing continues when there is an Error? #1784
-
I try to add the Batch Processing capability to our project and try to understand the error handling for Batches. As a source we consume a DynamoDB stream. Based on this graphic on the docs Thats our "architecture" graph LR
stream --batch_of_items--> lambda --put_message--> SQS
Given we have a batch size of 10. The handler then reports back that item 3 has failed and the checkpoint has moved to item 3. Then the next batch will processed which will have item 3-10. Again Item 3 will fail and 4-10 will be processed successfully. This will have the consequence that record 4-10 will again be put on the queue. If Item 3 keeps failing the stream processing is on hold and keeps retrying the same items? Is that correct? Do I need to implement idempotency together with PartialProcessing to reduce the message duplication? Or is a DLQ a solution for this problem so that Item 3 will be moved to the DLQ and therefore the checkpoint can be moved forward? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi @RaphaelManke thanks for creating this discussion. The Batch Processing utility hinges on the feature that allows AWS Lambda to report partial failures to its trigger service when processing a batch of items. When you set a Lambda function to be triggered by SQS messages (feature docs), Kinesis Stream items (feature docs), and DynamoDB Stream items (feature docs) you can configure it to report partial failures in this batch. This signals the Lambda service that one or more items that were marked as failed should be put back into the source and potentially retried later. For example, let's take the (simplified) example below: For simplicity we'll assume that we have 2 sequential batches that trigger one single function. The first batch is composed by items with ids 1 to 5. These items are the batch used to invoke the Lambda handler. The handler uses the Batch Processing utility to call the For this example we'll assume that when processing items 1 & 2 the When you throw an error within your record handler function, the Batch Processing utility catches this error and marks that item as At the end of the batch, the utility creates an object with this shape: {
batchItemFailures: [
{
itemIdentifier: "3"
}
],
}; This response tells the Lambda service to take the item with identifier 3 and put it back into the stream. If none of the items failed to process, the response object would be this: {
batchItemFailures: []
}; This tells Lambda that all items have processed successfully, and as such they can be removed from the source. What happens next to the item with identifier 3 depends entirely on how you have configured your function trigger integration. If you have enabled retries, the item will be retried (aka sent to the function in a subsequent batch) for the specified amount of retries. If you have set up a Dead Letter Queue, the item will be removed from the source and put into the queue once it exceeds the number of retries. For the sake of the example, we'll assume that this is the first time the function has "seen" that item and that retries are enabled. In this case the item will be part of a subsequent batch, together with never-seen-before items. All the items from the original batch that did not fail to process should not go back to the source. Idempotency per se it's not a requirement, the amount of times an object is seen by your function will depend entirely on the characteristics of the source (i.e. does your source guarantees exactly once delivery) and the retry configuration of your function trigger. So to sum up:
I hope this clarifies a bit how this is supposed to work, if not please let me know and I'll try again 😃 |
Beta Was this translation helpful? Give feedback.
Hi @RaphaelManke thanks for creating this discussion.
The Batch Processing utility hinges on the feature that allows AWS Lambda to report partial failures to its trigger service when processing a batch of items.
When you set a Lambda function to be triggered by SQS messages (feature docs), Kinesis Stream items (feature docs), and DynamoDB Stream items (feature docs) you can configure it to report partial failures in this batch. This signals the Lambda service that one or more items that were marked as failed should be put back into the source and potentially retried later.
For example, let's take the (simplified) example below:
For simplicity we'll assume that we have 2 sequential batche…