Enhancement: Batch processor should fail early if event payload is incorrect #2394

alexandreczg · 2023-06-06T15:59:57Z

Expected Behaviour

@heitorlessa this is a continuation of 23777

When using the BatchProcessor with process_partial_response from here I have noticed that the event does not have to respect the model and it can return an empty list for the batchItemFailures which will tell AWS to delete the messages from the queue in case of SQS triggers.

I'd expect the behavior to be fail-first fail-fast. If I got a bunch of messages to my Lambda that do not have the format of an EventType.SQS that should fail right away, in my opinion. Passing a random dict to the process_partial_response should not be considered a valid event.

Current Behaviour

The current implementation of process_partial_response accepts any dict passed in. Since it used the dict method .get to get the Records which is a key in SQS type events. But returns a default empty list in case the key doesn't exist. Which in turn will tell the context manager to run the processor on an empty list of records, which will always return success.

Code snippet

from aws_lambda_powertools.utilities.batch import (
    BatchProcessor, EventType, process_partial_response
)
from aws_lambda_powertools.utilities.typing import LambdaContext
import pydantic


class Event(pydantic.BaseModel):
    foo: str
    bar: str


def record_handler(record: Event):
    print(record)

    if record.foo == "barr":
        raise ValueError


def handler(event: dict, context: LambdaContext):

    return process_partial_response(
        event=event,
        record_handler=record_handler,
        processor=BatchProcessor(
            event_type=EventType.SQS,
            model=Event,
        ),
        context=context
    )


# This is an invalid event since it's not of the type SQS
# But the error about missing SQS model field is only thrown after the record_handler is called
# It should be at the very beginning, before any business logic occurs
print(handler({'Records': [{"foo": "bar", "bar": "baz"}, {"foo": "barr", "bar": "bazz"}]}, object))

# This is not valid, but does not throw an error
# And the batchItemFailures returns an empty list which is not accurate.
print(handler({"foo": "bar", "bar": "baz"}, object))

# This throws ValueError: Invalid event format. Please ensure batch event is a valid SQS event.
print(handler(["foo", "bar", "bar", "baz"], object))

Possible Solution

I'm sure there are a variety of reasons for why this is the way it is that I'm not considering. One obvious answer is that SQS event types will always have the 'Records` key hence anything that doesn't have it should answer the question: "How did you get here and should you even be considered?"

But I believe that we can be a bit more strict on enforcing that the dict passed is indeed of the type SQS, DynamoDB stream or Kinesis before doing any processing.

For example, in the snippet above, I pass in a dict with the Records key, but the rest of the event does not have the remaining SQS keys. When the record_handler fails, the batch processor looks for the messageId field, which doesn't exist. I believe if this was done upfront and fail immediately it would be a better developer experience.

What's everyone's thoughts?

Steps to Reproduce

Snippet above should be enough to demonstrate all scenarios brought up.

Powertools for AWS Lambda (Python) version

latest

AWS Lambda function runtime

3.10

Packaging format used

Lambda Layers

Debugging logs

No response

The text was updated successfully, but these errors were encountered:

heitorlessa · 2023-06-06T16:46:31Z

Thanks a lot for creating and giving more context @alexandreczg - I'll look into it first thing tomorrow morning (Amsterdam time).

heitorlessa · 2023-06-07T10:19:22Z

Going through this now (operational task took priority) - I removed the bug label for now as I'm leaning towards defensive programming from self-inflicted errors here.

I see two ways we can provide a better experience here - which one makes it more appealing to you?

A) Fail fast. We can raise a ValueError if the event was intentionally modified (if not records: raise ValueError("Invalid event format)).
- Pros: Immediate feedback although you know what you're doing upfront either way.
- Cons: Lambda Event Source Mapping (a.k.a managed poller) will scale down the rate messages are polled and your function is invoked. This can take a while to get back to "normal rate" (your mileage varies).
B) Warn. Warn that somehow we received no Records and proceed like we would (not calling your record handler).
- Pros: Doesn't interfere with Lambda Event Source Mapping. Your tests and actual prod use will continue to partially fail if you have an actual poison pill (malformed record, bad record input).
- Cons: As you pointed out, batch messages will be deleted from the queue because you intentionally didn't pass any to be processed. This means data loss.

While I'd argue this is self-inflicted, I do see the value in failing fast for a faster feedback loop locally when testing. Given the criticality of how many transactions we process a week, I don't feel comfortable in validating the schema because we can't guarantee these event sources won't change, leading to widespread incidents. IF a record is malformed, it'd be caught later in the process whether you use Pydantic or not.

Knowing Records wouldn't be empty in prod, I feel comfortable with checking if Records is empty for a faster feedback loop locally, where you might be trying custom middlewares that mutate events and accidentally pass the wrong ones, or intentionally testing the Batch processing feature would call your record handler function from an invalid payload.

Happy to guide you on making this first contribution and jump on a call to get your dev env setup :)

Let us know!

alexandreczg · 2023-06-08T02:02:41Z

Took me a while to carefully digest both options presented and the merits of each. I think there is a lot of value on your arguments.

I too agree with not throwing an error, it makes sense to me now. Option B) would be interesting to explore in order for the quick feedback loop on local development, especially for the AWS beginner, kind of nudging them in the right direction.

I'd love to jump on a call at your convenience!

heitorlessa · 2023-06-08T08:35:13Z

Awesome, B it is (my preference also). Send an email to aws-lambda-powertools-feedback at amazon dot com, and any of us will reply with a link that you schedule a slot ;)

If you'd like a head start, we have a pre-configured Cloud IDE you can use, along with steps you could use if you want to do this locally (make sure you've got a virtual env): https://github.com/awslabs/aws-lambda-powertools-python/blob/develop/CONTRIBUTING.md#contributing-via-pull-requests

Looking forward to it!

leandrodamascena · 2024-08-28T21:11:50Z

Hi @alexandreczg! I'm checking if you're still interested in submitting this PR. If not, no problem, I'm scheduling a revisit in 1 month and we'll implement option B.

alexandreczg added bug Something isn't working triage Pending triage from maintainers labels Jun 6, 2023

heitorlessa self-assigned this Jun 6, 2023

heitorlessa added enhancement batch Batch processing utility and removed bug Something isn't working labels Jun 7, 2023

heitorlessa changed the title ~~Bug: Batch processor should throw error when dict passed in does not conform to expected structure~~ Enhancement: Batch processor should throw error when dict passed in does not conform to expected structure Jun 7, 2023

heitorlessa changed the title ~~Enhancement: Batch processor should throw error when dict passed in does not conform to expected structure~~ Enhancement: Batch processor should fail early if event payload is incorrect Jun 7, 2023

heitorlessa removed the triage Pending triage from maintainers label Jun 8, 2023

sthulb added this to Powertools for AWS Lambda (Python) Jun 19, 2023

github-project-automation bot moved this to Triage in Powertools for AWS Lambda (Python) Jun 19, 2023

leandrodamascena moved this from Triage to Backlog in Powertools for AWS Lambda (Python) Jun 20, 2023

heitorlessa removed their assignment Dec 11, 2023

heitorlessa moved this from Backlog to Pending customer in Powertools for AWS Lambda (Python) Dec 11, 2023

heitorlessa added good first issue Good for newcomers help wanted Could use a second pair of eyes/hands labels Jun 10, 2024

heitorlessa mentioned this issue Jun 24, 2024

Roadmap update reminder weekly - Python #4611

Closed

leandrodamascena added the revisit Maintainer to provide update or revisit prioritization in the next cycle label Aug 28, 2024

leandrodamascena moved this from Pending customer to Backlog in Powertools for AWS Lambda (Python) Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: Batch processor should fail early if event payload is incorrect #2394

Enhancement: Batch processor should fail early if event payload is incorrect #2394

alexandreczg commented Jun 6, 2023 •

edited

Loading

heitorlessa commented Jun 6, 2023

heitorlessa commented Jun 7, 2023

alexandreczg commented Jun 8, 2023

heitorlessa commented Jun 8, 2023

leandrodamascena commented Aug 28, 2024

Enhancement: Batch processor should fail early if event payload is incorrect #2394

Enhancement: Batch processor should fail early if event payload is incorrect #2394

Comments

alexandreczg commented Jun 6, 2023 • edited Loading

Expected Behaviour

Current Behaviour

Code snippet

Possible Solution

Steps to Reproduce

Powertools for AWS Lambda (Python) version

AWS Lambda function runtime

Packaging format used

Debugging logs

heitorlessa commented Jun 6, 2023

heitorlessa commented Jun 7, 2023

alexandreczg commented Jun 8, 2023

heitorlessa commented Jun 8, 2023

leandrodamascena commented Aug 28, 2024

alexandreczg commented Jun 6, 2023 •

edited

Loading