Commit behaviour when using kgo.DisableAutoCommit() #598

kalbhor · 2023-10-18T11:36:28Z

We have a use-case where we want to manually commit messages that are processed in a consumer group using CommitOffsetsSync. So as to disable auto-commit behavior we passed kgo.DisableAutoCommit() while initializing the client. We noticed that setting kgo.DisableAutoCommit() would commit the head as the final offset when we would close the consumer. After brief exploration we found that

franz-go/pkg/kgo/consumer_group.go

Line 1814 in ae169a1

func (g *groupConsumer) updateUncommitted(fetches Fetches) {

would be called and it commits offsets till head.

Our expectation when passing kgo.DisableAutoCommit() is that only manual commits are possible. I am curious to know why the library chooses to commit till head as the final commit when the consumer is closed.

The text was updated successfully, but these errors were encountered:

twmb · 2023-10-19T01:17:42Z

Good find, I'll remove this.

twmb · 2023-10-21T18:23:13Z

Wait, coming back to this, I don't think this is a bug. I misread the issue originally -- I thought you were saying the client was committing automatically in LeaveGroup even though you had autocommit disabled, but that doesn't happen. With autocommit enabled, there is a goroutine that starts that commits head offsets (not dirty offsets) every 5s -- the difference between head and dirty is not relevant when disabling autocommit.

With autocommit disabled, the goroutine is not started. updateUncommited updates the head offset always, so that when you call UncommittedOffsets(), you have the latest offsets that you have polled available to commit.

When you close the consumer, OnPartitionsRevoked is called. With autocommit disabled, there is no commit here anywhere.

What's the confusion here?

kalbhor · 2023-10-23T08:48:32Z

I had CommitMarkedOffsets() in my OnPartitionsRevoked() callback. This means that when the group is closed it tries to get the marked offsets and commit them. It uses the getUncommitted() method to fetch these offsets with dirty=false. In the case of auto commit being disabled I assume that only offsets that are passed using CommitOffsetsSync() are considered "committed" and there is no concept of "marking" offsets for commit later. But calling CommitMarkedOffsets() would result in marking the head offset as committed. Is this behavior expected? Maybe the CommitMarkedOffsets() should do nothing in the case when auto commit is disabled?

twmb · 2023-10-23T20:48:05Z

Marked records are only relevant for AutoCommitMarks. Marking records is a no-op when you have disabled autocommit (and MarkCommitRecords documents that).

If you disable autocommit, all offsets you poll are always available / always what you receive when you query offsets (via UncommittedOffsets(), or CommitUncommittedOffsets()).

CommitMarkedOffsets() can be changed to be a no-op if autocommit is disabled, but this is just an added safety check -- the docs all currently dissuade you from using any mark APIs unless you've opted into AutoCommitMarks.

I'm not sure what the question is at this point -- do you need to switch methods? Is something else wrong here?

kalbhor · 2023-10-24T03:50:17Z

Your explanation is correct and ideally the library's user shouldn't be calling Mark related methods when autocommit is disabled as it is documented as a suggested usecase pattern in your docs as well. Maybe this is a more API border decision related issue since under the hood the library is considering UncommittedOffsets as MarkedOffsets when using CommitMarkedOffsets(). Uncommitted offsets and marked offsets are different in the case of auto-commit disabled. The library is assuming CommitMarkedOffsets() is only going to be used in the context of auto-commit. In that case maybe an error should be returned when using an abstraction like CommitMarkedOffsets() when auto-commit is disabled? Or it could be a no-op just like marking records is currently.

twmb · 2023-11-01T02:50:48Z

Sure.... I'll make the mark functions no-op.

Closes #598.

author Travis Bischel <[email protected]> 1698807017 -0600 committer Tiago Peczenyj <[email protected]> 1703159929 +0100 parent 6ebcb43 author Travis Bischel <[email protected]> 1698807017 -0600 committer Tiago Peczenyj <[email protected]> 1703159889 +0100 kgo: no-op mark functions when not using AutoCommitMarks Closes twmb#598. kgo: pin AddPartitionsToTxn to v3 when using one transaction KIP-890 has been updated such that v3 must be used by clients. We will pin to v3 unless multiple transactions are being added, or unless any transaction is verify only. Closes twmb#609. GHA: try redpandadata/redpanda Latest stable kgo: be sure to use topics when other topics are paused Follow up from twmb#585, there was a bug in the commit for it. If any topic was paused, then all non-paused topics would be returned once, but they would not be marked as fetchable after that. I _think_ the non-fetchability would eventually be cleared on a metadata update, _but_ the source would re-fetch from the old position again. The only way the topic would advance would be if no topics were paused after the metadata update. However this is a bit confusing, and overall this patch is required. This also patches a second bug in PollFetches with pausing: if a topic has a paused partition, if the fetch response does NOT contain any paused partitions, then the logic would actually strip the entire topic. The pause tests have been strengthened a good bit -- all lines but one are hit, and the one line that is not hit could more easily be hit if more partitions are added to the topic / a cluster of size one is used. The line is currently not hit because it requires one paused partition and one unpaused partition to be returned from the same broker at the same time. Lastly, this adds an error reason to why list or epoch is reloading, which was used briefly while investigating test slowness. sticky: further improvements * Introduces separate functions for go 1.21+, allowing to eliminate unremoveable allocs from sort.Sort. To keep it simple, this simplifies <=1.20 a little bit, so that is **slightly** more inefficient. * Improves new-partition assignment further -- ensure we always place unassigned partitions on the least consuming member. CHANGELOG: update for v1.15.2 parent 33e15f9 author Victor <[email protected]> 1699638659 -0300 committer Tiago Peczenyj <[email protected]> 1703159232 +0100 parent 33e15f9 author Victor <[email protected]> 1699638659 -0300 committer Tiago Peczenyj <[email protected]> 1703159156 +0100 chore: fix typo define public interface instead use *logrus.Logger add example fix lint issue with exhaustive add new api improve tests, format code Update klogrus.go Improve existing documentation Update klogrus.go Fix typos Update klogrus.go remove period kgo source: use the proper topic-to-id map when forgetting topics Adding topics to a session needs to use the fetch request's topic2id map (which then promotes IDs into the session t2id map). Importantly, and previously this was wrong / not the case: removing topics from a session needs to use the session's t2id map. The topic does not exist in the request's topic2id map, because well, it's being forgotten. It's not in the fetch request. Adds some massive comments explaining the situation. Closes twmb#620. consuming: reset to nearest if we receive OOOR while fetching If we receive OOOR while fetching after a fetch was previously successful, something odd happened in the broker. Either what we were consuming was truncated underfoot, which is normal and expected (for very slow consumers), or data loss occurred without a leadership transfer. We will reset to the nearest offset after our prior consumed offset (by time!) because, well, that's what's most valid: we previously had a valid offset, and now it is invalid. Closes twmb#621. use bytes buffer instead of ReadAll CHANGELOG: note incoming v1.15.3 pkg/sr: improve base URL and resource path joining * use `url.JoinPath()` to join the base URL with the path for cleaning any ./ or ../ element * also move hardDelete to a request context query parameter kfake: add SleepControl This function allows you to sleep a function you are controlling until your wakeup function returns. The control function effectively yields to other requests. Note that since requests must be handled in order, you need to be a bit careful to not block other requests (unless you intentionally do that). This basically does what it says on the tin. The behavior of everything else is unchanged -- you can KeepControl, you can return false to say it wasn't handled, etc. The logic and control flow is a good bit ugly, but it works and is fairly documented and "well contained". In working on this, I also found and fixed a bug that resulted in correllation errors when handling join&sync. kgo group tests still work against kfake's "hidden" main.go, and I have tested SleepControl with/without KeepControl, and with/without returning handled=true. build(deps): bump golang.org/x/crypto in /pkg/sasl/kerberos Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.14.0 to 0.17.0. - [Commits](golang/crypto@v0.14.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> build(deps): bump golang.org/x/crypto from 0.13.0 to 0.17.0 in /pkg/kadm Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.13.0 to 0.17.0. - [Commits](golang/crypto@v0.13.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> build(deps): bump golang.org/x/crypto in /examples/bench Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.11.0 to 0.17.0. - [Commits](golang/crypto@v0.11.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> bump all deps, except klauspost/compress bumping klauspost/compress requires go1.19. We'll do this bump with v1.16. fix go.* kgo source: use the proper topic-to-id map when forgetting topics Adding topics to a session needs to use the fetch request's topic2id map (which then promotes IDs into the session t2id map). Importantly, and previously this was wrong / not the case: removing topics from a session needs to use the session's t2id map. The topic does not exist in the request's topic2id map, because well, it's being forgotten. It's not in the fetch request. Adds some massive comments explaining the situation. Closes twmb#620. consuming: reset to nearest if we receive OOOR while fetching If we receive OOOR while fetching after a fetch was previously successful, something odd happened in the broker. Either what we were consuming was truncated underfoot, which is normal and expected (for very slow consumers), or data loss occurred without a leadership transfer. We will reset to the nearest offset after our prior consumed offset (by time!) because, well, that's what's most valid: we previously had a valid offset, and now it is invalid. Closes twmb#621. use bytes buffer instead of ReadAll CHANGELOG: note incoming v1.15.3 pkg/sr: improve base URL and resource path joining * use `url.JoinPath()` to join the base URL with the path for cleaning any ./ or ../ element * also move hardDelete to a request context query parameter kfake: add SleepControl This function allows you to sleep a function you are controlling until your wakeup function returns. The control function effectively yields to other requests. Note that since requests must be handled in order, you need to be a bit careful to not block other requests (unless you intentionally do that). This basically does what it says on the tin. The behavior of everything else is unchanged -- you can KeepControl, you can return false to say it wasn't handled, etc. The logic and control flow is a good bit ugly, but it works and is fairly documented and "well contained". In working on this, I also found and fixed a bug that resulted in correllation errors when handling join&sync. kgo group tests still work against kfake's "hidden" main.go, and I have tested SleepControl with/without KeepControl, and with/without returning handled=true. fix go.* chore: fix typo define public interface instead use *logrus.Logger add example fix lint issue with exhaustive add new api improve tests, format code Update klogrus.go Improve existing documentation Update klogrus.go Fix typos Update klogrus.go remove period kgo source: use the proper topic-to-id map when forgetting topics Adding topics to a session needs to use the fetch request's topic2id map (which then promotes IDs into the session t2id map). Importantly, and previously this was wrong / not the case: removing topics from a session needs to use the session's t2id map. The topic does not exist in the request's topic2id map, because well, it's being forgotten. It's not in the fetch request. Adds some massive comments explaining the situation. Closes twmb#620. consuming: reset to nearest if we receive OOOR while fetching If we receive OOOR while fetching after a fetch was previously successful, something odd happened in the broker. Either what we were consuming was truncated underfoot, which is normal and expected (for very slow consumers), or data loss occurred without a leadership transfer. We will reset to the nearest offset after our prior consumed offset (by time!) because, well, that's what's most valid: we previously had a valid offset, and now it is invalid. Closes twmb#621. use bytes buffer instead of ReadAll CHANGELOG: note incoming v1.15.3 pkg/sr: improve base URL and resource path joining * use `url.JoinPath()` to join the base URL with the path for cleaning any ./ or ../ element * also move hardDelete to a request context query parameter kfake: add SleepControl This function allows you to sleep a function you are controlling until your wakeup function returns. The control function effectively yields to other requests. Note that since requests must be handled in order, you need to be a bit careful to not block other requests (unless you intentionally do that). This basically does what it says on the tin. The behavior of everything else is unchanged -- you can KeepControl, you can return false to say it wasn't handled, etc. The logic and control flow is a good bit ugly, but it works and is fairly documented and "well contained". In working on this, I also found and fixed a bug that resulted in correllation errors when handling join&sync. kgo group tests still work against kfake's "hidden" main.go, and I have tested SleepControl with/without KeepControl, and with/without returning handled=true. build(deps): bump golang.org/x/crypto in /pkg/sasl/kerberos Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.14.0 to 0.17.0. - [Commits](golang/crypto@v0.14.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> build(deps): bump golang.org/x/crypto from 0.13.0 to 0.17.0 in /pkg/kadm Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.13.0 to 0.17.0. - [Commits](golang/crypto@v0.13.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> build(deps): bump golang.org/x/crypto in /examples/bench Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.11.0 to 0.17.0. - [Commits](golang/crypto@v0.11.0...v0.17.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> bump all deps, except klauspost/compress bumping klauspost/compress requires go1.19. We'll do this bump with v1.16. fix go.* kgo source: use the proper topic-to-id map when forgetting topics Adding topics to a session needs to use the fetch request's topic2id map (which then promotes IDs into the session t2id map). Importantly, and previously this was wrong / not the case: removing topics from a session needs to use the session's t2id map. The topic does not exist in the request's topic2id map, because well, it's being forgotten. It's not in the fetch request. Adds some massive comments explaining the situation. Closes twmb#620. consuming: reset to nearest if we receive OOOR while fetching If we receive OOOR while fetching after a fetch was previously successful, something odd happened in the broker. Either what we were consuming was truncated underfoot, which is normal and expected (for very slow consumers), or data loss occurred without a leadership transfer. We will reset to the nearest offset after our prior consumed offset (by time!) because, well, that's what's most valid: we previously had a valid offset, and now it is invalid. Closes twmb#621. use bytes buffer instead of ReadAll CHANGELOG: note incoming v1.15.3 pkg/sr: improve base URL and resource path joining * use `url.JoinPath()` to join the base URL with the path for cleaning any ./ or ../ element * also move hardDelete to a request context query parameter kfake: add SleepControl This function allows you to sleep a function you are controlling until your wakeup function returns. The control function effectively yields to other requests. Note that since requests must be handled in order, you need to be a bit careful to not block other requests (unless you intentionally do that). This basically does what it says on the tin. The behavior of everything else is unchanged -- you can KeepControl, you can return false to say it wasn't handled, etc. The logic and control flow is a good bit ugly, but it works and is fairly documented and "well contained". In working on this, I also found and fixed a bug that resulted in correllation errors when handling join&sync. kgo group tests still work against kfake's "hidden" main.go, and I have tested SleepControl with/without KeepControl, and with/without returning handled=true. fix go.* kfake: add DropControl, SleepOutOfOrder, CoordinatorFor, RehashCoordinators * Sleeping was a bit limited because if two requests came in on the same connection, you could not really chain logic. Sleeping out of order allows you to at least run some logic to gate how requests behave with each other. It's not the most obvious, so it is not the default. * Adds SleepOutOfOrder * Adds CoordinatorFor so you can see which "broker" a coordinator request will go to * Adds RehashCoordinators to change where requests are hashed to The latter two allow you to loop rehashing until a coordinator for your key changes, if you want to force NotCoordinator requests. kgo: do not cancel FindCoordinator if the parent context cancels Some load testing in Redpanda showed a failure where consuming quit unexpectedly and unrecoverably. The sequence of events is: * if OffsetCommit is issued just before Heartbeat * and the group needs to be loaded so FindCoordinator is triggered, * and OffsetCommit happens again, canceling the prior commit's context Then, * FindCoordinator would cancel * Heartbeat, which is waiting on the same load, would fail with context.Canceled * This error is seen as a group leave error * The group management logic would quit entirely. Now, the context used for FindCoordinator is the client context, which is only closed on client close. This is also better anyway -- if two requests are waiting for the same coordinator load, we don't want the first request canceling to error the second request. If all requests cancel and we have a stray FindCoordinator in flight, that's ok too, because well, worst case we'll just eventually have a little bit of extra data cached that is likely needed in the future anyway. Closes redpanda-data/redpanda#15131 CHANGELOG: document incoming v1.15.4

twmb mentioned this issue Oct 19, 2023

v1.15.1 status #589

Closed

7 tasks

twmb added the bug Something isn't working label Oct 19, 2023

twmb added waiting and removed bug Something isn't working labels Oct 21, 2023

twmb added a commit that referenced this issue Nov 1, 2023

kgo: no-op mark functions when not using AutoCommitMarks

72778cb

Closes #598.

twmb mentioned this issue Nov 1, 2023

kgo: no-op mark functions when not using AutoCommitMarks #614

Merged

twmb closed this as completed in #614 Nov 1, 2023

twmb removed the waiting label May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit behaviour when using kgo.DisableAutoCommit() #598

Commit behaviour when using kgo.DisableAutoCommit() #598

kalbhor commented Oct 18, 2023

twmb commented Oct 19, 2023

twmb commented Oct 21, 2023

kalbhor commented Oct 23, 2023

twmb commented Oct 23, 2023

kalbhor commented Oct 24, 2023

twmb commented Nov 1, 2023

Commit behaviour when using kgo.DisableAutoCommit() #598

Commit behaviour when using kgo.DisableAutoCommit() #598

Comments

kalbhor commented Oct 18, 2023

twmb commented Oct 19, 2023

twmb commented Oct 21, 2023

kalbhor commented Oct 23, 2023

twmb commented Oct 23, 2023

kalbhor commented Oct 24, 2023

twmb commented Nov 1, 2023