-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Event parser envelope (for Kinesis) with sequence number #2850
Comments
Thanks for opening your first issue here! We'll come back to you as soon as we can. |
hey @jarikujansuu thank you for taking the time on filling this feature request - could you expand on how you'd like to use Batch processing with an entire batch? If you could expand more on the processing experience for Kinesis with any code snippets it'd be super helpful. While we could fail fast like we do with This info will give us a better picture of what you're trying to do, the authoring experience you desire, and how partial failure support would work if your function would receive the entire batch in a single call. Thanks a lot! |
I perhaps had two separate issues in here. My squad is producing data to Kinesis stream and I am checking how we could help our customers to simplify their code, and why they are not using But lets start... Why event parser should return also sequence number?
So any processing done after first failure are wasted and will result in duplicates in downstream. I think only possible benefit for processing all even after failing first time is if you are not going to retry anymore (or don't use partial batch failures at all) But to be able to use partial batch failures at all you can't use Similar issue with at least Why batch processing should be able to pass (smaller) batches to custom code? We have use case where we take data from DynamoDb Stream and send to EventBus. So if we process events one by one we can send only single event per But I see it as valid design decision to keep batch processing simple so just process things one event at time and more complex cases could be implemented with help of Why batch processing should fail fast with DynamoDb Streams and Kinesis There is of course some caveats. Like if that is your last retry attempt, should you try to save at least some events? We bisect the batch so much and then dlq handling takes care of the rest. |
quick note to say I haven't forgotten and will respond today |
Thank you so much for clarifying and sharing more details @jarikujansuu - it's clear now. Splitting into work streams to make it easier to have actions on them, please let me know if I missed or misinterpreted any area. Fail fast for Streams in Batch ProcessingThat is something we want to do, it's a valid concern. It's something I wanted to do for a while but haven't prioritized, so having a customer driven need makes this easier -- welcome a PR and to guide you to make this faster, otherwise we can look into after we complete other major work items in early September.
[Action] We should create a specialized Batch Processor for Kinesis and one for DynamoDB that stops on first failure by default. We should update documentation to recommend the new one, explain why under Receive entire batchAs of now, I'm not inclined as it's a trade-off. I concur with the need to optimize networking calls like in your case, but the average person will now have to incur additional boilerplate to handle processing, unless I'm missed something? My reservation with this model is that there is not much a value add, and that's where things go off the rails too quickly as we've seen it first hand (reason why we created this model). Async Batch Processor wouldn't work here either (all items at once!) because your need is to efficiently call downstream service when you already buffered enough info. [Action] Feel free to create a RFC. We then wait for customer demand (10+) to validate need and shape the direction. For now, extending batch processor is an alternative solution for savvy customers. Keeping sequence number in Kinesis EnvelopeAs of now, I'm not inclined due to existing customers feedback that led us to create the For Batch, you can access the sequence number in the current model hence why we didn't use an envelope. However, I'm aware it doesn't work for you as you'd want the entire Batch. [Action] No action needed. |
I think fail fast for Streams in Batch Processing would simplify our current client squads code nicely. If I have understood correctly that same For our own service ability to use both sequence number and multiple events at same time is kind of must, so probably I just implement such envelope myself as it seems like trivial thing. And if only we need it then sharing code isn't an issue 😄 Or it might be actually quite simple to extend batch processor to pass for example list of 10 events as Pydantic models, and if that fails send partial failure with all their ids. |
Yeah (return first failed immediately) — let’s do the new Kinesis and
DynamoDB classes first like FIFO, and then we regroup on a call as I’d like
to see what you’ve got. I’ve got the feeling I might be missing something
here ;-)
…On Thu, 3 Aug 2023 at 15:54, Jari Kujansuu ***@***.***> wrote:
I think *fail fast for Streams in Batch Processing* would simplify our
current client squads code nicely. If I have understood correctly that same
SqsFifoPartialProcessor logic could be used for Kinesis and DynamoDb as
for them you can include all failed, it just takes the lowest. Of course
_short_circuit_processing can easily be changed to return just first
failed. 👍
For our own service ability to use both sequence number and multiple
events at same time is kind of must, so probably I just implement such
envelope myself as it seems like trivial thing. And if only we need it then
sharing code isn't an issue 😄
Or it might be actually quite simple to extend batch processor to pass for
example list of 10 events as Pydantic models, and if that fails send
partial failure with all their ids.
—
Reply to this email directly, view it on GitHub
<#2850 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZPQBCEDAX4GTKCMUKG3ITXTOUQJANCNFSM6AAAAAA2YJCIOA>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
We're adding this in V3, as we were unable to prioritize. If anyone would like to help creating a new class for DynamoDB and one for Kinesis to fail fast like FIFO, please go ahead and we can help out. |
Use case
We want to handle Kinesis streaming events and use Pydantic models with
event_parser
, AND be able to do partial failures.If use
KinesisDataStreamEnvelope
you just get list of model so you can't do partial failures.Solution/User Experience
Envelope that would return for example
List[Tuple[str, Optional[Model]]]
which would allow similar easy handling of the actual model data but also make it possible to easily implement partial batch failures. Similar would probably be useful for other similar envelopes for services that support partial batch failures.Alternative solutions
Make Powertools batch processing able to work with batches, not single items, in custom code side. And support for failing instantly when first processing fails instead of processing all regardless of errors.
Acknowledgment
The text was updated successfully, but these errors were encountered: