-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix duplicate messages when resuming partitions not paused #4636
Fix duplicate messages when resuming partitions not paused #4636
Conversation
b228618
to
94fd544
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. Just few minor changes.
src/rdkafka_partition.c
Outdated
/* If partitions isn't paused, avoid bumping its version, | ||
* as it'll result in resuming fetches from a stale | ||
* next_fetch_start */ | ||
rd_bool_t paused = rd_false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are 2 variables pause
and paused
. It feels confusing. Please change the name of this variable to is_already_paused
@@ -2299,7 +2299,22 @@ rd_kafka_resp_err_t rd_kafka_toppar_op_pause_resume(rd_kafka_toppar_t *rktp, | |||
int flag, | |||
rd_kafka_replyq_t replyq) { | |||
int32_t version; | |||
rd_kafka_op_t *rko; | |||
rd_kafka_op_t *rko = rd_kafka_op_new(RD_KAFKA_OP_PAUSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is checked before the operation is sent to the queue. What about the rd_kafka_toppar_pause_resume
in which this op is handled. Is this case handled there?
tests/0145-unnecessary_resume_mock.c
Outdated
* | ||
* @param partition_assignment_strategy Assignment strategy to test. | ||
*/ | ||
static void test_no_duplicate_messages_unnecessary_resume( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name of the test should be better. It can be pause_resume_mock
.
tests/0145-unnecessary_resume_mock.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to define a better numbering for mock tests. Integration tests starts with '0', unit test with '8', so we can use mock tests with '4' maybe?
@@ -511,7 +512,7 @@ struct test tests[] = { | |||
_TEST(0142_reauthentication, 0, TEST_BRKVER(2, 2, 0, 0)), | |||
_TEST(0143_exponential_backoff_mock, TEST_F_LOCAL), | |||
_TEST(0144_idempotence_mock, TEST_F_LOCAL, TEST_BRKVER(0, 11, 0, 0)), | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep extra line.
tests/0145-unnecessary_resume_mock.c
Outdated
test_msgver_init(&mv, testid); | ||
test_consumer_poll("consume", rk, testid, -1, 0, msgcnt, &mv); | ||
|
||
TEST_SAY("Unnecessary resume\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test failing before the change?
f11be5a
to
58e9166
Compare
cooperative assignor
62583dd
to
20bcfd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
In case of subscription change with a consumer using the cooperative assignor it could resume fetching from a previous position. That could also happen if resuming a partition that wasn't paused. Fixed by ensuring that a resume operation is completely a no-op when the partition isn't paused
Fixed a bug causing duplicate message consumption from a stale fetch start offset in some particular cases.
In case of subscription change with a consumer using the cooperative assignor it could resume fetching from a previous position.
That could also happen if resuming a partition that wasn't paused.
Fixed by ensuring that a resume operation is completely a no-op when the partition isn't paused.
Fixes #4637