Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More precise batch publish errors #1321

Merged
merged 7 commits into from
Sep 11, 2024

Conversation

ytalashko
Copy link
Contributor

@ytalashko ytalashko commented Aug 31, 2024

This increases the precision of the result of producer method produceChunkAsyncWithFailures for cases where something went wrong. There are two changes compared to before:

  1. Each entry in the result ofproduceChunkAsyncWithFailures now accurately corresponds to each record in the input chunk. Previously, if sending fails directly (*) on any of the given records, the error would be used for all records in the batch, ignoring the send-outcome of the other records. An advantage of this change is that if sending some records failed, but some other records were actually sent, you can now correctly see all of that in the method's response.
  2. In addition, if sending fails directly for a record (*), we no longer attempt to send subsequent records from the input. The result contains a PublishOmittedException for each record that is not sent. When implementing retries, this change makes it easier to publish records in the original order.

In addition, we introduce unit-level tests for the producer.

(*) By 'sending' we mean offering a record to the underlying java Kafka producer. Sending can fail directly (when we call the method), or later on (from a callback).

Copy link
Collaborator

@erikvanoosten erikvanoosten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! By making sure the loop uses the ref for every case we can retain the trick of counting the number of updates to the array before calling done.succeed. Much better than my rough proposal. 🚀

The benchmark looks very good! (this pr is the last point on the graph)
Scherm­afbeelding 2024-09-01 om 09 41 03

What is missing now is a test and documentation. Let us know if you want help with either of these.

@ytalashko
Copy link
Contributor Author

What is missing now is a test and documentation. Let us know if you want help with either of these.

Sounds great, 👍, thanks for the suggestions, Erik!
Overall, if there is no hurry with this changes, I can add both myself bit by bit, if this works for you.
Already started adding unit tests. Could you, please, elaborate, which documentation we want to have and where, I believe a new .md file in docs folder or addition to some existing one (not sure which one, 'cause there is not much on the producer side)?
Let me know otherwise, so I can commit what I have started for tests, maybe it will be of any use.

@erikvanoosten
Copy link
Collaborator

erikvanoosten commented Sep 4, 2024

Could you, please, elaborate, which documentation we want to have and where

I was mostly thinking of the scaladoc of the produceChunk* methods in trait Producer. Perhaps add something like:

For all produceChunk* methods except produceChunkAsyncWithFailures:

When publishing any of the records fails, the whole batch fails even
though some records might have been published. Use
`produceChunkAsyncWithFailures` to get results per record.

And for produceChunkAsyncWithFailures:

When publishing a record fails, the following records in the batch are
not published. This is indicated with a [[SendOmittedException]].

About that, I now see that SendOmittedException should have been called PublishOmittedException!

Let me know otherwise, so I can commit what I have started for tests, maybe it will be of any use.

We don't have any low level tests for the producer yet (I mean, with a mock kafka client). Anything in this area would be super helpful!

@ytalashko
Copy link
Contributor Author

I was mostly thinking of the scaladoc of the produceChunk* methods in trait Producer. Perhaps add something like:

Oh, got it, sounds good, will do so, 👍
Thanks for the suggestions!

About that, I now see that SendOmittedException should have been called PublishOmittedException!

Yeah, this is a great call, thanks, Erik!

We don't have any low level tests for the producer yet (I mean, with a mock kafka client). Anything in this area would be super helpful!

Nice, will update PR soon.

@erikvanoosten
Copy link
Collaborator

I think this looks amazing. The test are pretty complicated but they look like what is needed.
Would you like me to update the description of the PR? IMHO that would be the last thing before we can merge.

@svroonland, did you already take a look as well?

@ytalashko
Copy link
Contributor Author

ytalashko commented Sep 8, 2024

The test are pretty complicated but they look like what is needed.

I think they can be revisited later and simplified/updated, this is just a start so to say

Would you like me to update the description of the PR?

Sure, would love that, thanks, Erik!

@erikvanoosten erikvanoosten changed the title Provide proper produce results in case of send call errors More precise publish errors Sep 8, 2024
@erikvanoosten erikvanoosten changed the title More precise publish errors More precise batch publish errors Sep 8, 2024
@erikvanoosten
Copy link
Collaborator

erikvanoosten commented Sep 8, 2024

Would you like me to update the description of the PR?

Done!

@erikvanoosten erikvanoosten merged commit 023f11e into zio:master Sep 11, 2024
14 checks passed
@ytalashko ytalashko deleted the proper-produce-results branch September 12, 2024 16:50
@erikvanoosten
Copy link
Collaborator

In this PR the producer no longer attempts to send subsequent records after the send method fails. As of zio-kafka 2.10.0 the producer also stops sending after a failure communicated via the callback. For more details see #1437.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants