-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: merge two-phase scheduler #5955
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
@@ -130,6 +130,13 @@ var defaultServerConfig = &ServerConfig{ | |||
IteratorSlowReadDuration: 256, | |||
}, | |||
Messages: defaultMessageConfig.Clone(), | |||
|
|||
EnableTwoPhaseScheduler: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's disabled for now.
Should it be |
Both are fine, they all keep commit history, while rebase keep a linear history. |
/run-all-tests |
/run-all-tests |
/run-kafka-integration-test |
|
/run-all-tests |
/run-kafka-integration-test |
/run-verify-ci |
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## master #5955 +/- ##
================================================
+ Coverage 57.1666% 57.6794% +0.5127%
================================================
Files 686 697 +11
Lines 80923 82421 +1498
================================================
+ Hits 46261 47540 +1279
- Misses 30368 30556 +188
- Partials 4294 4325 +31 |
/run-kafka-integration-test |
1 similar comment
/run-kafka-integration-test |
* tp: add table id to heartbeat and support burst remove Signed-off-by: Neil Shen <[email protected]> * tp: support basic balance tables Signed-off-by: Neil Shen <[email protected]>
* add more tests. * coverage to 79.1 * Update cdc/scheduler/internal/tp/agent.go Co-authored-by: Neil Shen <[email protected]> Co-authored-by: Neil Shen <[email protected]>
Signed-off-by: Neil Shen <[email protected]>
* cdc: integrate two phase scheduler Signed-off-by: Neil Shen <[email protected]> * tp: fix duplicate topic handlers panic Signed-off-by: Neil Shen <[email protected]> * tp: add From to messages from scheduler Signed-off-by: Neil Shen <[email protected]> * tp: add more comments and fix typo Signed-off-by: Neil Shen <[email protected]> * cdc: fix missing table ID in table meta Signed-off-by: Neil Shen <[email protected]> * cdc: fix processor.tick panic Signed-off-by: Neil Shen <[email protected]> * tp: fix missing processor epoch Signed-off-by: Neil Shen <[email protected]> * tp: fix missing checkpoint ts in burstBalanceTask Signed-off-by: Neil Shen <[email protected]> * tp: fix lost AddTable request during Commit Signed-off-by: Neil Shen <[email protected]> * Revert "cdc: fix processor.tick panic" This reverts commit ebb0f2c.
* add all changes. * also check capture status, before rebalance tables. * add basic move table scheduler implementation. * add basic implementation of manual scheduling api. * fix rebalance. * add teste for manual move table . * revert change in pb.go * fix * fix by review.
* fix by suggestion, use unique state for the table. * fix TestTableExecutorAddingTableDirectly * add ut TestTableExecutorAddingTableIndirectly * add ut TestTableExecutorAddingTableIndirectly
* tp: support balance when add a new capture Signed-off-by: Neil Shen <[email protected]> * tp: fix an error when a stopped table state follows by a heartbeat response Signed-off-by: Neil Shen <[email protected]> * fix tests Signed-off-by: Neil Shen <[email protected]> * tp: add more tests Signed-off-by: Neil Shen <[email protected]>
… calculation (pingcap#5761) * fix tp advance checkpoint. * fix ddl sink log. * add a ut for replication manager advance checkpoint ts.
…essages. (pingcap#5791) * agent should return absent directly. * do not panic when remove table not exist. * no need to check for table status.
* tp: fill resolved ts when add a new table Signed-off-by: Neil Shen <[email protected]> * tp: relax invarint check Signed-off-by: Neil Shen <[email protected]> * tp: add metrics Signed-off-by: Neil Shen <[email protected]> * tp: record tasks in burstBalance Signed-off-by: Neil Shen <[email protected]>
Signed-off-by: Neil Shen <[email protected]>
* tp: clean up stale capture table metrics Signed-off-by: Neil Shen <[email protected]> * tp: refine balance scheduler Signed-off-by: Neil Shen <[email protected]> * tp: seperate balance scheduler to basic and balance Signed-off-by: Neil Shen <[email protected]> * fix tests Signed-off-by: Neil Shen <[email protected]> * Apply suggestions from code review Co-authored-by: Ling Jin <[email protected]> * address comments Signed-off-by: Neil Shen <[email protected]> Co-authored-by: Ling Jin <[email protected]>
…p2p messages. (pingcap#5820) * add table struct to agent. * agent add table state machine. * simplify coordinator. * agent to be state awared. * call IsRemoveTableFinished to clean table resource from processor. * fix message header. * fix agent. * prepare new agent ready. * fix some test. * add basic ut. * fix agent handle message ut. * add all test. * fix agent handle stopping. * adjust pipeline table state. * refine the agent. * introduce tableManager. * fix all tests. * add some new test. * fix log. * fix by make check * fix by review comment. * fix by review comment. * fix ut. * fix check. * agent fix heartbeat does not refresh each tick. * remove scheduler log. * fix ut. * fix some test. * fix ut. * rename * fix by check.
pingcap#5906) * tp: clean up metrics and add logs * tp: cleanup running task when a table has shutdown * tests: skip check move table results Signed-off-by: Neil Shen <[email protected]>
Signed-off-by: Neil Shen <[email protected]>
* drain capture scheduler schedule logic. * add drain capture scheduler ut. * coordinator adjust. * simplify drain capture, only return table count. Co-authored-by: Neil Shen <[email protected]> Signed-off-by: Neil Shen <[email protected]>
Signed-off-by: Neil Shen <[email protected]>
Signed-off-by: Neil Shen <[email protected]>
Signed-off-by: Neil Shen <[email protected]>
/merge |
This pull request has been accepted and is ready to merge. Commit hash: 26ff1a4
|
/run-all-tests |
@overvenus: Your PR was out of date, I have automatically updated it for you. At the same time I will also trigger all tests for you: /run-all-tests If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
/run-verify |
What problem does this PR solve?
Issue Number: ref #4757
What is changed and how it works?
Add two-phase scheduler to reduce latency spike during rolling restart.
Commits are cherry-picked from fb/latency branch.
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note