Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent offsets from being committed manually when a rebalance is in progress and partitions are owned by consumer #4089

Conversation

roxelo
Copy link

@roxelo roxelo commented Dec 1, 2022

See #4059

When using cooperative sticky, attempting to commit offsets during a rebalance can trigger a follow up rebalance.
One way of reducing the risk of this happening, is to add the same check that is included in the auto commit function in the manual commit method

…n a rebalance is in progress and partitions are owned by consumer
@@ -400,6 +400,12 @@ rd_kafka_commit(rd_kafka_t *rk,
rq = RD_KAFKA_REPLYQ(repq, 0);
}

/* Don't attempt auto commit when rebalancing or initializing since
* the rkcg_generation_id is most likely in flux. */
if (rkcg->rkcg_subscription &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not thread safe (this is on the application thread, but rkcg is local to the rdkafka main thread).

Check should probably be performed in rd_kafka_cgrp_offsets_commit().

But we need to make sure that it is ok to commit during the phase of the rebalance when the rebalance callback/event is triggered.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, I also removed the same check from rd_kafka_cgrp_offset_commit_tmr_cb since the function will also call rd_kafka_cgrp_offsets_commit. I am not sure if this is the ideal behavior for auto commit though as we would no longer "silently" skip committing offsets...

Copy link
Contributor

@edenhill edenhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure to run make style-fix (on linux, since it si broken on osx)

@@ -0,0 +1,140 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bravo on test!

tests/0137-cooperative_commit_rebalance.c Outdated Show resolved Hide resolved
@wmorgan6796
Copy link

@roxelo whats the status of this PR? Any help needed?

@roxelo
Copy link
Author

roxelo commented Mar 10, 2023

@wmorgan6796 Some of the tests failed and I haven't had a chance to look at it further. If you have some time to investigate, another set of eyes would help

@wmorgan6796
Copy link

wmorgan6796 commented Mar 14, 2023

@wmorgan6796 Some of the tests failed and I haven't had a chance to look at it further. If you have some time to investigate, another set of eyes would help

I believe I found the reason for the test failure. We fundamentally change how offset committing occurs by short circuiting even sending the OffsetCommitRequest to the brokers. When we send the request to the brokers, we will get an RD_KAFKA_RESP_ERR_UNKNOWN_MEMBER_ID from the (mock) brokers. But since the request never makes it to the brokers, we end up changing the error returned to RD_KAFKA_RESP_ERR_REBALANCE_IN_PROGRESS.

I think the correct thing to do is allow for the error returned to change, but ultimately would leave it up to the maintainers to decide is ok. In the mean time I'll bring this branch up to date with the latest librdkafka

@wmorgan6796
Copy link

#4089 (comment)

@milindl @emasab regarding this, what are your thoughts?

@roxelo
Copy link
Author

roxelo commented Apr 3, 2023

I am closing this PR since it's out of date and @wmorgan6796 opened #4220 to fix the issue

@roxelo roxelo closed this Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants