Feature request: Fix inconsistency between BatchProcessor success_handler and failure_handler call arguments #2816

duc00 · 2023-07-21T14:46:35Z

Use case

As mentioned in BatchProcessor's documentation (https://docs.powertools.aws.dev/lambda/python/latest/utilities/batch/#create-your-own-partial-processor), BatchProcessor can be extended by overriding success_handler and failure_handler methods:

class MyProcessor(BatchProcessor):
    def failure_handler(self, record: SQSRecord, exception: ExceptionInfo) -> FailureResponse:
        metrics.add_metric(name="BatchRecordFailures", unit=MetricUnit.Count, value=1)
        return super().failure_handler(record, exception)

Reading this, I see and understand that the record is passed to the function as a Pydantic model. I would thus expect success_handler to take the same type of record as failure_handler. That is not currently the case, as record is passed as raw dict data in BatchProcessor's _process_record function:

powertools-lambda-python/aws_lambda_powertools/utilities/batch/base.py

Lines 485 to 502 in 021076e

    
               def _process_record(self, record: dict) -> Union[SuccessResponse, FailureResponse]: 
        
                   """ 
        
                   Process a record with instance's handler 
        
                   Parameters 
        
                   ---------- 
        
                   record: dict 
        
                       A batch record to be processed. 
        
                   """ 
        
                   data: Optional["BatchTypeModels"] = None 
        
                   try: 
        
                       data = self._to_batch_type(record=record, event_type=self.event_type, model=self.model) 
        
                       if self._handler_accepts_lambda_context: 
        
                           result = self.handler(record=data, lambda_context=self.lambda_context) 
        
                       else: 
        
                           result = self.handler(record=data) 
        
                       return self.success_handler(record=record, result=result)

In my opinion, record args should be homogeneous between the two functions, or otherwise devs wanting to override success_handler will encounter errors at runtime, have to dig in this tool's source code and understand the subtlety (like I did).

Solution/User Experience

First solution would be to add an example for success_handler in the documentation (https://docs.powertools.aws.dev/lambda/python/latest/utilities/batch/#extending-batchprocessor). This would notify developers of the differences between the functions' signature.

Second solution would be to pass the Pydantic model as record to success_handler. This should be a pretty straightforward change in the source code since this structure is already instantiated in _process_record. This would make the tool more homogeneous as well as improve experience in using the record's attributes in success_handler.

Alternative solutions

No response

Acknowledgment

This feature request meets Powertools for AWS Lambda (Python) Tenets
Should this be considered in other Powertools for AWS Lambda languages? i.e. Java, TypeScript, and .NET

The text was updated successfully, but these errors were encountered:

boring-cyborg · 2023-07-21T14:46:37Z

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

leandrodamascena · 2023-07-21T23:23:23Z

Hi @duc00! Thank you for opening this issue. I am moving this issue to pending review to work on this by early next week.

heitorlessa · 2023-07-24T11:58:08Z

hey @duc00, it makes sense but it's a breaking change. The reason we didn't do that back (2020) then was that Batch Processing predated both Parser (Pydantic models) and Event Source Data Classes. Changing even to a SQSRecord would've caused customers tests or any custom processor to fail.

While I agree with the inconsistency and dislike it very much, changing now could break several customers and we're overly conscious of that. The reason failure_handler has it is that back then we wanted to provide more details for a plausible serialization issue (among other types of errors).

Moving forward

Tentatively, we might launch v3 by the end of the year due to Python 3.7 EOL in Lambda. Until then, we'd be happy to have a flag to override this behaviour, then make the default in v3. If you have an idea for a flag or would like to make a PR to it we'd be happy to guide you.

Alternative

We could pay another tech debt and start specialized Batch Processors (e.g., BatchProcessor -> SqsBatchProcessor). That way we start clean, don't break customers, update the docs to recommend that instead for new customers, and make the codebase leaner - we've been holding breaking changes for 3 years so naturally it has cruft.

duc00 · 2023-07-25T08:23:53Z

Hi @heitorlessa, thank you for your answer. It makes perfect sense.

I am fine for now instantiating the Pydantic model within success_handler. So I think we can leave the code as is, as long as this is a known discrepancy that would be fixed in a future major version.

I would however suggest we mention this is the doc. I could write a PR for this. How does this alternative sound?

Thanks!

heitorlessa · 2023-07-25T08:34:00Z

absolutely and thank you for offering the PR ;) Let me know once is out and we can merge it quickly.

heitorlessa · 2023-07-27T07:16:09Z

I just remembered another scenario that would be difficult to always have a model as argument for failure_handler: Data validation error.

For Pydantic Models, a poison pill/malformed payload would prevent it to become an actual model. To prevent failing the entire batch, we handle that gracefully and convert to the closest known Event Source Data Class to allow failure collection.

If we were to have an union and/or a parameter, it could confuse non-experienced customers.

Record value when model validation worked but batch item failed for another reason

Record value when model validation failed

heitorlessa · 2023-07-27T09:41:04Z

I've just put a PR with several improvements to Batch documentation and credited accordingly. I created sample outputs for this exact area - let me know if this helps (your PR will tackle another part IIRC).

duc00 · 2023-07-27T09:44:46Z

I just remembered another scenario that would be difficult to always have a model as argument for failure_handler: Data validation error.

For Pydantic Models, a poison pill/malformed payload would prevent it to become an actual model. To prevent failing the entire batch, we handle that gracefully and convert to the closest known Event Source Data Class to allow failure collection.

If we were to have an union and/or a parameter, it could confuse non-experienced customers.

This is what I understood, and this is fine as we can deal with the Pydantic model's type in failure_handler. My main point was about success_handler where are only given raw dict data, and not the actual Pydantic mode (which should always be available since this is called on success)

duc00 · 2023-07-27T09:45:52Z

Thanks @heitorlessa, I will take a look later today or tomorrow.

heitorlessa · 2023-07-27T09:45:58Z

🤦🏻 yes, sorry, been too deep in the woods rewriting the docs that I forgot the success_handler predating experience.

duc00 · 2023-07-28T16:10:57Z

@heitorlessa, I've just opened a new PR for adding details to success/failure_handler docs. Thanks!

heitorlessa · 2023-07-28T17:21:08Z

Just merged, thank you so much @duc00 for this quality contribution. Already available in our stage docs: https://docs.powertools.aws.dev/lambda/python/stage/utilities/batch/#extending-batchprocessor

It'll go to prod in our next release

github-actions · 2023-07-28T17:21:29Z

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

duc00 added feature-request feature request triage Pending triage from maintainers labels Jul 21, 2023

github-project-automation bot added this to Powertools for AWS Lambda (Python) Jul 21, 2023

github-project-automation bot moved this to Triage in Powertools for AWS Lambda (Python) Jul 21, 2023

leandrodamascena added batch Batch processing utility and removed triage Pending triage from maintainers labels Jul 21, 2023

leandrodamascena moved this from Triage to Pending review in Powertools for AWS Lambda (Python) Jul 21, 2023

leandrodamascena assigned leandrodamascena and heitorlessa and unassigned leandrodamascena Jul 21, 2023

heitorlessa moved this from Pending review to Pending customer in Powertools for AWS Lambda (Python) Jul 24, 2023

heitorlessa added the breaking-change Breaking change label Jul 24, 2023

duc00 mentioned this issue Jul 28, 2023

docs(batch): explain record type discrepancy in failure and success handler #2868

Merged

7 tasks

heitorlessa linked a pull request Jul 28, 2023 that will close this issue

docs(batch): explain record type discrepancy in failure and success handler #2868

Merged

7 tasks

heitorlessa added documentation Improvements or additions to documentation and removed breaking-change Breaking change labels Jul 28, 2023

heitorlessa closed this as completed Jul 28, 2023

github-project-automation bot moved this from Pending customer to Coming soon in Powertools for AWS Lambda (Python) Jul 28, 2023

heitorlessa moved this from Coming soon to Shipped in Powertools for AWS Lambda (Python) Aug 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Fix inconsistency between BatchProcessor success_handler and failure_handler call arguments #2816

Feature request: Fix inconsistency between BatchProcessor success_handler and failure_handler call arguments #2816

duc00 commented Jul 21, 2023

boring-cyborg bot commented Jul 21, 2023

leandrodamascena commented Jul 21, 2023

heitorlessa commented Jul 24, 2023 •

edited

Loading

duc00 commented Jul 25, 2023 •

edited

Loading

heitorlessa commented Jul 25, 2023

heitorlessa commented Jul 27, 2023 •

edited

Loading

heitorlessa commented Jul 27, 2023

duc00 commented Jul 27, 2023

duc00 commented Jul 27, 2023

heitorlessa commented Jul 27, 2023

duc00 commented Jul 28, 2023

heitorlessa commented Jul 28, 2023

github-actions bot commented Jul 28, 2023

Feature request: Fix inconsistency between BatchProcessor success_handler and failure_handler call arguments #2816

Feature request: Fix inconsistency between BatchProcessor success_handler and failure_handler call arguments #2816

Comments

duc00 commented Jul 21, 2023

Use case

Solution/User Experience

Alternative solutions

Acknowledgment

boring-cyborg bot commented Jul 21, 2023

leandrodamascena commented Jul 21, 2023

heitorlessa commented Jul 24, 2023 • edited Loading

duc00 commented Jul 25, 2023 • edited Loading

heitorlessa commented Jul 25, 2023

heitorlessa commented Jul 27, 2023 • edited Loading

heitorlessa commented Jul 27, 2023

duc00 commented Jul 27, 2023

duc00 commented Jul 27, 2023

heitorlessa commented Jul 27, 2023

duc00 commented Jul 28, 2023

heitorlessa commented Jul 28, 2023

github-actions bot commented Jul 28, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

heitorlessa commented Jul 24, 2023 •

edited

Loading

duc00 commented Jul 25, 2023 •

edited

Loading

heitorlessa commented Jul 27, 2023 •

edited

Loading