-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manual commits only after successful message production #300
Comments
You are right that the default auto commit kicks in before passing the message to the application rather than when the application is done processing the message. There are actually two layers of offset commit, first there is the store_offsets() that stores (in client memory) the offset to be committed, then there's the actual commit which commits the stored offset. store_offsets() is run automatically for each message just prior to passing the message to the application, which breaks your at-least-once guarantee. So, what you do is disable the automatic offset store by setting |
Thanks for the additional context. I think I understand, but I want to be sure I get how this ensures we don’t store/auto-commit an offset, then later find out that the delivery of a lower offset failed. Is the basic idea that if producer delivery is going to fail, we can assume it will fail in some bounded amount of time. So, if we use |
A produce()d message is guaranteed to succeed or fail within message.timeout.ms (+ a bit of grace time). The case of reordering only exists if input and output partitions don't use the same partitioning schema, which they should do if the key remains the same. So I think you want to start out testing with something like:
|
got it. thanks a bunch |
@edenhill Thanks for the detailed recipe! |
Description
I expect a relatively common use case for Kafka is to consume a message, process it, and publish the results to a different topic (let’s say, one published message for every one message consumed). I am wondering if I am overlooking “the easy way” to use manual commits to do this while guaranteeing at least once processing/publishing of every message consumed.
The default auto-commit strategy doesn’t seem to satisfy this guarantee.The solution also seems more nuanced than manually committing the relevant consumer message offset in the producer’s delivery report callback. Both of those strategies seem like they would violate the at least once processing/publishing guarantee if I:
a
associated with consumer offset xb
associated with consumer offset x + 1b
) which also effectively commits offset xa
Given the assumptions above, the logic required to meet this guarantee seems tricky for what I expect is a typical use case. Currently, I have logic in the delivery report callback that tracks pending commits for each TopicPartition and only finalizes a commit when there are no smaller offsets still outstanding.
Am I overlooking something? Maybe librdkafka has the necessary intelligence in its implementation of
commit()
without my help (though I could not find evidence of it after a quick search), or maybe this is handled by Kafka’s offset tracking?Broker version: 0.11.0
Consumer configuration:
How to reproduce
Checklist
Please provide the following information:
confluent_kafka.version()
andconfluent_kafka.libversion()
):{...}
'debug': '..'
as necessary)The text was updated successfully, but these errors were encountered: