-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mysql]fix duplicate split request for newly added table #1156
[mysql]fix duplicate split request for newly added table #1156
Conversation
@leonardBang @PatrickRen there should be something wrong with oracle connector or it's test case, past several unrelated PRs all failed on this. |
+1, the fix solved our online problem |
Would you explain how to reproduce the bug more detailed ? I have tried tens times according to #1149 , but could not reproduce. You say: "Which lead to two snapshot splits handled by one subtask , one split fetcher and one same debezium BinaryLogClient at the same time." There is no problem that multiple snapshot splits handled by one subtask, which happens when decrease parallelism(e.g 4 to 1) in snapshot stage. |
@lzshlzsh The problem here is that it starts to handle the second split before finishing first one. What I observe here is that the second split keeps waiting for it's watermark end event , which only happens when I haven't read the code of BinaryLogClient very carefully, but I believe the reason is that BinaryLogClient doesn't read binlog endlessly. it read exactly amount of binlog that has been decided when it connects. so when the second split also use this client, it most likely reach an eof and jump out before reach the high watermark of second split. I haven't reproduce this problem except in our online database, I think the core here is the table A I mentioned in #1149 must has finished snapshot reading and keeps having new DML sql. |
@leonardBang Hi, I'm wandering how is the review going, I checked code of BinaryLogClient recently and pretty sure the reason lead to this bug is what I described before. The BinaryLogClient for the first split won't read new log for the second. |
3b33c2b
to
b61c748
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yujiaxinlong for the detail digging! I like your analysis, LGTM.
I also help rebase and improve the PR a little as the original PR has some conflicts.
related to #1149
I debugged for a while and find the reason of stucking at the second split(not 100% percent second , could be third or later but always stuck) is that it triggers two split requests on one subtask at the same time.
When adding a new table to an existed flinkcdc task , if the old table has finished snapshot phase and has been consuming binlog. It will read some binlog splits before finding there is new table during task restart.
Here when a binlog split finished in MysqlSourceReader#onSplitFinished(...), it actually activate next split twice.
Which lead to two snapshot splits handled by one subtask , one split fetcher and one same debezium BinaryLogClient at the same time.
This BinaryLogClient mostly reach an EOF before the high watermark of the second split here, so the binlogSplitReadTask for the second split will never trigger an binlog end watermark event, which eventually lead to snapshotreader hangs forever.
The WakeupReaderEvent for snapshotsplit seems not needed. Currently I just removed the context.sendSplitRequest() here and keep this event for future usage. And this works fine in my case.