Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Fix inconsistency between BatchProcessor success_handler and failure_handler call arguments #2816

Closed
1 of 2 tasks
duc00 opened this issue Jul 21, 2023 · 13 comments · Fixed by #2868
Closed
1 of 2 tasks
Assignees
Labels
batch Batch processing utility documentation Improvements or additions to documentation feature-request feature request

Comments

@duc00
Copy link
Contributor

duc00 commented Jul 21, 2023

Use case

As mentioned in BatchProcessor's documentation (https://docs.powertools.aws.dev/lambda/python/latest/utilities/batch/#create-your-own-partial-processor), BatchProcessor can be extended by overriding success_handler and failure_handler methods:

class MyProcessor(BatchProcessor):
    def failure_handler(self, record: SQSRecord, exception: ExceptionInfo) -> FailureResponse:
        metrics.add_metric(name="BatchRecordFailures", unit=MetricUnit.Count, value=1)
        return super().failure_handler(record, exception)

Reading this, I see and understand that the record is passed to the function as a Pydantic model. I would thus expect success_handler to take the same type of record as failure_handler. That is not currently the case, as record is passed as raw dict data in BatchProcessor's _process_record function:

def _process_record(self, record: dict) -> Union[SuccessResponse, FailureResponse]:
"""
Process a record with instance's handler
Parameters
----------
record: dict
A batch record to be processed.
"""
data: Optional["BatchTypeModels"] = None
try:
data = self._to_batch_type(record=record, event_type=self.event_type, model=self.model)
if self._handler_accepts_lambda_context:
result = self.handler(record=data, lambda_context=self.lambda_context)
else:
result = self.handler(record=data)
return self.success_handler(record=record, result=result)

In my opinion, record args should be homogeneous between the two functions, or otherwise devs wanting to override success_handler will encounter errors at runtime, have to dig in this tool's source code and understand the subtlety (like I did).

Solution/User Experience

First solution would be to add an example for success_handler in the documentation (https://docs.powertools.aws.dev/lambda/python/latest/utilities/batch/#extending-batchprocessor). This would notify developers of the differences between the functions' signature.

Second solution would be to pass the Pydantic model as record to success_handler. This should be a pretty straightforward change in the source code since this structure is already instantiated in _process_record. This would make the tool more homogeneous as well as improve experience in using the record's attributes in success_handler.

Alternative solutions

No response

Acknowledgment

@duc00 duc00 added feature-request feature request triage Pending triage from maintainers labels Jul 21, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 21, 2023

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

@leandrodamascena
Copy link
Contributor

Hi @duc00! Thank you for opening this issue. I am moving this issue to pending review to work on this by early next week.

@leandrodamascena leandrodamascena added batch Batch processing utility and removed triage Pending triage from maintainers labels Jul 21, 2023
@leandrodamascena leandrodamascena moved this from Triage to Pending review in Powertools for AWS Lambda (Python) Jul 21, 2023
@heitorlessa
Copy link
Contributor

heitorlessa commented Jul 24, 2023

hey @duc00, it makes sense but it's a breaking change. The reason we didn't do that back (2020) then was that Batch Processing predated both Parser (Pydantic models) and Event Source Data Classes. Changing even to a SQSRecord would've caused customers tests or any custom processor to fail.

While I agree with the inconsistency and dislike it very much, changing now could break several customers and we're overly conscious of that. The reason failure_handler has it is that back then we wanted to provide more details for a plausible serialization issue (among other types of errors).

Moving forward

Tentatively, we might launch v3 by the end of the year due to Python 3.7 EOL in Lambda. Until then, we'd be happy to have a flag to override this behaviour, then make the default in v3. If you have an idea for a flag or would like to make a PR to it we'd be happy to guide you.

Alternative

We could pay another tech debt and start specialized Batch Processors (e.g., BatchProcessor -> SqsBatchProcessor). That way we start clean, don't break customers, update the docs to recommend that instead for new customers, and make the codebase leaner - we've been holding breaking changes for 3 years so naturally it has cruft.

@heitorlessa heitorlessa moved this from Pending review to Pending customer in Powertools for AWS Lambda (Python) Jul 24, 2023
@heitorlessa heitorlessa added the breaking-change Breaking change label Jul 24, 2023
@duc00
Copy link
Contributor Author

duc00 commented Jul 25, 2023

Hi @heitorlessa, thank you for your answer. It makes perfect sense.

I am fine for now instantiating the Pydantic model within success_handler. So I think we can leave the code as is, as long as this is a known discrepancy that would be fixed in a future major version.

I would however suggest we mention this is the doc. I could write a PR for this. How does this alternative sound?

Thanks!

@heitorlessa
Copy link
Contributor

absolutely and thank you for offering the PR ;) Let me know once is out and we can merge it quickly.

@heitorlessa
Copy link
Contributor

heitorlessa commented Jul 27, 2023

I just remembered another scenario that would be difficult to always have a model as argument for failure_handler: Data validation error.

For Pydantic Models, a poison pill/malformed payload would prevent it to become an actual model. To prevent failing the entire batch, we handle that gracefully and convert to the closest known Event Source Data Class to allow failure collection.

If we were to have an union and/or a parameter, it could confuse non-experienced customers.


Record value when model validation worked but batch item failed for another reason

image

Record value when model validation failed

image

@heitorlessa
Copy link
Contributor

I've just put a PR with several improvements to Batch documentation and credited accordingly. I created sample outputs for this exact area - let me know if this helps (your PR will tackle another part IIRC).

image

@duc00
Copy link
Contributor Author

duc00 commented Jul 27, 2023

I just remembered another scenario that would be difficult to always have a model as argument for failure_handler: Data validation error.

For Pydantic Models, a poison pill/malformed payload would prevent it to become an actual model. To prevent failing the entire batch, we handle that gracefully and convert to the closest known Event Source Data Class to allow failure collection.

If we were to have an union and/or a parameter, it could confuse non-experienced customers.

This is what I understood, and this is fine as we can deal with the Pydantic model's type in failure_handler. My main point was about success_handler where are only given raw dict data, and not the actual Pydantic mode (which should always be available since this is called on success)

@duc00
Copy link
Contributor Author

duc00 commented Jul 27, 2023

Thanks @heitorlessa, I will take a look later today or tomorrow.

@heitorlessa
Copy link
Contributor

🤦🏻 yes, sorry, been too deep in the woods rewriting the docs that I forgot the success_handler predating experience.

@duc00
Copy link
Contributor Author

duc00 commented Jul 28, 2023

@heitorlessa, I've just opened a new PR for adding details to success/failure_handler docs. Thanks!

@heitorlessa heitorlessa added documentation Improvements or additions to documentation and removed breaking-change Breaking change labels Jul 28, 2023
@heitorlessa
Copy link
Contributor

Just merged, thank you so much @duc00 for this quality contribution. Already available in our stage docs: https://docs.powertools.aws.dev/lambda/python/stage/utilities/batch/#extending-batchprocessor

It'll go to prod in our next release

@github-project-automation github-project-automation bot moved this from Pending customer to Coming soon in Powertools for AWS Lambda (Python) Jul 28, 2023
@github-actions
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@heitorlessa heitorlessa moved this from Coming soon to Shipped in Powertools for AWS Lambda (Python) Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
batch Batch processing utility documentation Improvements or additions to documentation feature-request feature request
Projects
Status: Shipped
3 participants