[mysql]fix duplicate split request for newly added table #1156

yujiaxinlong · 2022-05-07T09:28:45Z

related to #1149

I debugged for a while and find the reason of stucking at the second split(not 100% percent second , could be third or later but always stuck) is that it triggers two split requests on one subtask at the same time.

When adding a new table to an existed flinkcdc task , if the old table has finished snapshot phase and has been consuming binlog. It will read some binlog splits before finding there is new table during task restart.

Here when a binlog split finished in MysqlSourceReader#onSplitFinished(...), it actually activate next split twice.

SuspendBinlogReaderAckEvent -> WakeupReaderEvent -> context.sendSplitRequest()
directly context.sendSplitRequest().

Which lead to two snapshot splits handled by one subtask , one split fetcher and one same debezium BinaryLogClient at the same time.

This BinaryLogClient mostly reach an EOF before the high watermark of the second split here, so the binlogSplitReadTask for the second split will never trigger an binlog end watermark event, which eventually lead to snapshotreader hangs forever.

The WakeupReaderEvent for snapshotsplit seems not needed. Currently I just removed the context.sendSplitRequest() here and keep this event for future usage. And this works fine in my case.

yujiaxinlong · 2022-05-07T11:15:37Z

@leonardBang @PatrickRen there should be something wrong with oracle connector or it's test case, past several unrelated PRs all failed on this.

lzshlzsh · 2022-05-19T03:31:59Z

+1, the fix solved our online problem

lzshlzsh · 2022-05-19T17:54:13Z

Would you explain how to reproduce the bug more detailed ? I have tried tens times according to #1149 , but could not reproduce.

You say: "Which lead to two snapshot splits handled by one subtask , one split fetcher and one same debezium BinaryLogClient at the same time."

There is no problem that multiple snapshot splits handled by one subtask, which happens when decrease parallelism(e.g 4 to 1) in snapshot stage.

yujiaxinlong · 2022-05-20T04:42:22Z

@lzshlzsh
I think multiple splits indeed can be handled by one subtask, but should not at the same time.

The problem here is that it starts to handle the second split before finishing first one. What I observe here is that the second split keeps waiting for it's watermark end event , which only happens when currentBinlogOffset.isAtOrAfter(binlogSplit.getEndingOffset()) see MySqlBinlogSplitReadTask#handleEvent(Event event). But it never get it.

I haven't read the code of BinaryLogClient very carefully, but I believe the reason is that BinaryLogClient doesn't read binlog endlessly. it read exactly amount of binlog that has been decided when it connects. so when the second split also use this client, it most likely reach an eof and jump out before reach the high watermark of second split.

I haven't reproduce this problem except in our online database, I think the core here is the table A I mentioned in #1149 must has finished snapshot reading and keeps having new DML sql.

yujiaxinlong · 2022-09-23T10:06:27Z

@leonardBang Hi, I'm wandering how is the review going, I checked code of BinaryLogClient recently and pretty sure the reason lead to this bug is what I described before. The BinaryLogClient for the first split won't read new log for the second.

This closes apache#1149.

leonardBang

Thanks @yujiaxinlong for the detail digging! I like your analysis, LGTM.
I also help rebase and improve the PR a little as the original PR has some conflicts.

This closes apache#1149.

leonardBang self-requested a review May 20, 2022 05:47

yurunchuan and others added 2 commits October 19, 2022 11:21

[mysql] Avoid duplicate split requests when add new table (apache#1156)

2a9c473

[mysql] Avoid duplicate split requests when add new table.(apache#1156)

b61c748

This closes apache#1149.

leonardBang force-pushed the fix_duplicate_split_request branch from 3b33c2b to b61c748 Compare October 19, 2022 03:22

leonardBang approved these changes Oct 19, 2022

View reviewed changes

leonardBang merged commit d343538 into apache:master Oct 19, 2022

leonardBang pushed a commit that referenced this pull request Oct 19, 2022

[mysql] Avoid duplicate split requests when add new table (#1156)

ad80e47

leonardBang added this to the V2.3.0 milestone Oct 19, 2022

lzshlzsh mentioned this pull request Feb 14, 2023

[mysql-cdc] Fix the hung up of snapshot phase when reuse binaryLogClient #1915

Closed

ChaomingZhangCN pushed a commit to ChaomingZhangCN/flink-cdc that referenced this pull request Jan 13, 2025

[mysql] Avoid duplicate split requests when add new table (apache#1156)

79f3905

ChaomingZhangCN pushed a commit to ChaomingZhangCN/flink-cdc that referenced this pull request Jan 13, 2025

[mysql] Avoid duplicate split requests when add new table.(apache#1156)

2fdaf11

This closes apache#1149.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mysql]fix duplicate split request for newly added table #1156

[mysql]fix duplicate split request for newly added table #1156

yujiaxinlong commented May 7, 2022 •

edited

Loading

yujiaxinlong commented May 7, 2022

lzshlzsh commented May 19, 2022

lzshlzsh commented May 19, 2022 •

edited

Loading

yujiaxinlong commented May 20, 2022

yujiaxinlong commented Sep 23, 2022

leonardBang left a comment

[mysql]fix duplicate split request for newly added table #1156

[mysql]fix duplicate split request for newly added table #1156

Conversation

yujiaxinlong commented May 7, 2022 • edited Loading

yujiaxinlong commented May 7, 2022

lzshlzsh commented May 19, 2022

lzshlzsh commented May 19, 2022 • edited Loading

yujiaxinlong commented May 20, 2022

yujiaxinlong commented Sep 23, 2022

leonardBang left a comment

Choose a reason for hiding this comment

yujiaxinlong commented May 7, 2022 •

edited

Loading

lzshlzsh commented May 19, 2022 •

edited

Loading