-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] [ml] fix key_shared mode delivery are order out order after a consumers reconnection #23803
base: master
Are you sure you want to change the base?
[fix] [ml] fix key_shared mode delivery are order out order after a consumers reconnection #23803
Conversation
@poorbarcode Please add the following content to your PR description and select a checkbox:
|
Please update the PR title. It currently includes |
@poorbarcode PIP-282 (implementation PR #21953) was made to address known out-of-order issues in the Key_Shared implementation. PIP-282 was targeted for Pulsar 4.0 and later replaced with PIP-379 implementation. It feels that major changes to the Pulsar 3.0 Key_Shared implementation should be made carefully due to the high chance of changes causing other regressions. |
@poorbarcode This diff of this PR is currently messed up, so hard to see what the changes are. |
1591c06
to
3cbf13d
Compare
Rebased master, it is clear now.
I will modify the Motivation to make every thing clear late |
"the normal reading is just pending there and waiting for new messages". It's better to cancel it since it does cause later problems. There's code to handle it, but the unnecessary work could be completely avoided. |
|
There would be an opposite impact when handled correctly. When there's a race between a pending read and a replay read, either one will be discarded. This is why extra work is performed and that leads to increased latencies eventually since unnecessary reads are performed. There's currently no broker side caching for replay reads, so discarded replay reads will cause extra traffic between broker and bookies. To avoid the extra latency, when the pending read gets cancelled and once the replay read is complete, a new pending read should be issued so that new entries get handled immediately when they are available. I'll be doing this type of change to address #23845, which is one of the issues where a read doesn't continue when a read gets skipped.
Please explain more about these items since I didn't catch the meaning.
When I made the comment, there was no description for this PR, so I'll check the updated description. |
BTW, we can start a new email channel to talk about this.
I consider that the current implementation is the best solution
The case Read Ahead:
I think we should not cancel |
Motivation
cursor.rewind
execute concurrently.Information
read-position
:3:21
mark-deleted-position:
3: -1`acked messages
:3:1 ~ 3:20
cursor.rewind
3:0
)cursor.rewind
, which may be triggered by many casescursor.readPosition
back to the first entry(3:0
)3:0
to consumer3:0 ~ 3:20
to consumerIssue: the consumer received
[3:0, 3:0, 3:1, 3:2....3:20]. you can reproduce the issue by the new test
testMixedReplayReadingAndNormalReading("testRepeatedDelivery")`Modifications
Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: x