scheduler(2pc): agent for 2 phase scheduling #5593

3AceShowHand · 2022-05-25T12:04:52Z

What problem does this PR solve?

Issue Number: ref #4757

What is changed and how it works?

2 phase scheduling agent implementation.

Check List

Tests

Unit test

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

`None`.

ti-chi-bot · 2022-05-25T12:04:53Z

[REVIEW NOTIFICATION]

This pull request has been approved by:

overvenus

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

…-scheduling

cdc/scheduler/internal/table_executor.go

cdc/scheduler/internal/base/agent.go

cdc/processor/pipeline/table.go

cdc/processor/processor_test.go

cdc/scheduler/internal/tp/agent.go

overvenus · 2022-05-30T09:34:11Z

cdc/scheduler/internal/tp/agent.go

+			// This panic will happen only if two messages have been received
+			// with the same ownerRev but with different ownerIDs.
+			// This should never happen unless the election via Etcd is buggy.
+			log.Panic("owner IDs do not match",


What if p2p batches messages of different owners?

This does not panic, because message from different owner should have different revision.

This is for two owner with the same revision, but different owner ID, which is unrecoveriable etcd error.

dm/pkg/schema/tracker_test.go

cdc/scheduler/internal/tp/schedulepb/table_schedule_manual_test.go

overvenus · 2022-05-30T10:44:40Z

cdc/scheduler/internal/tp/replication_set.go

@@ -573,6 +576,7 @@ func (r *ReplicationSet) handleRemoveTable() ([]*schedulepb.Message, error) {
 			zap.Any("replicationSet", r), zap.Int64("tableID", r.TableID))
 		return nil, nil
 	}
+	// todo: OldState must be `replicating` here


How do we deal with these todo?

It looks OldState is only used for logging, we can remove it from the log, since we know that it's must be in desired state.

overvenus · 2022-05-30T10:50:01Z

cdc/scheduler/internal/tp/agent.go

+
+	// pendingTasks is a queue of dispatch table task yet to be processed.
+	// the Deque stores *dispatchTableTask.
+	pendingTasks deque.Deque


Why do we need a deque? can we handle tasks immediately? Owner control the total number of task that are running concurrently.

Consider that dispatchTableRequest A and B received, sent by owner 1, and followed by heartbeat or dispatchTableRequest sent by owner 2. A and B should not be processed.

By putting tasks in pending states, make sure this tick does not do unnecessary task handling.

But I consider that, no matter owner switch or not, all tasks should be handled, this can reduce unnecessary table rescheduling, and ongoing old task does not affect global table distribution and scheduling correctly, since the heartbeat can help the new owner detect all tables' states.

So, we can remove pendingTasks, just handling all dispatchTableTasks directly.

* fix some typo. * update table scheduler proto * add some new to agent. * track owner info. * try to handle dispatch table request. * add more and more to agent implementation. * fix update owner info. * finish handle dispatch table. * tackle epoch * remove checkpoint from proto * handle heartbeat with stopping. * add benchmark for heartbeat response. * fix agent. * fix agent code layout. * refine benchmark test. * refine coordinator / capture_manager / relication_manager. * fix agent. * add a lot of test. * revise the code. * fix by suggestion. * fix by suggestion. * remoe pendingTask. * fix unit test.

3AceShowHand added 3 commits May 25, 2022 14:24

fix some typo.

e58e8f9

update table scheduler proto

3d42633

add some new to agent.

e2b4854

ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/needs-linked-issue needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 25, 2022

3AceShowHand changed the base branch from master to fb/latency May 25, 2022 12:05

ti-chi-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 25, 2022

3AceShowHand changed the title ~~scheduler(2pc): agent for 2 phase scheduling~~ [DNM] scheduler(2pc): agent for 2 phase scheduling May 25, 2022

3AceShowHand added 3 commits May 26, 2022 11:44

track owner info.

bed5eac

try to handle dispatch table request.

f29a2b7

Merge remote-tracking branch 'upstream/fb/latency' into agent-2-phase…

2d3416f

…-scheduling

ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 26, 2022

3AceShowHand added 6 commits May 26, 2022 23:20

add more and more to agent implementation.

45c7bd1

fix update owner info.

2a33c73

finish handle dispatch table.

4917640

tackle epoch

208f9da

remove checkpoint from proto

d670016

handle heartbeat with stopping.

182f2a9

ti-chi-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 27, 2022

3AceShowHand force-pushed the agent-2-phase-scheduling branch from 365c2bf to 182f2a9 Compare May 27, 2022 12:53

3AceShowHand added 5 commits May 27, 2022 20:54

add benchmark for heartbeat response.

12d522d

fix agent.

e3d9ee5

fix agent code layout.

40c21f5

refine benchmark test.

c19041c

refine coordinator / capture_manager / relication_manager.

208199d

3AceShowHand added 4 commits May 29, 2022 01:00

fix agent.

102f9e8

add a lot of test.

739349f

revise the code.

7a6f6a1

fix by suggestion.

7b32ac9

3AceShowHand changed the title ~~[DNM] scheduler(2pc): agent for 2 phase scheduling~~ scheduler(2pc): agent for 2 phase scheduling May 30, 2022

ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 30, 2022

3AceShowHand requested a review from overvenus May 30, 2022 09:25

3AceShowHand added component/scheduler TiCDC inner scheduler component. area/ticdc Issues or PRs related to TiCDC. and removed do-not-merge/needs-linked-issue labels May 30, 2022

overvenus reviewed May 30, 2022

View reviewed changes

3AceShowHand added 3 commits May 30, 2022 23:00

fix by suggestion.

efe840b

remoe pendingTask.

b5943d1

fix unit test.

57dc360

overvenus approved these changes May 31, 2022

View reviewed changes

ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 31, 2022

3AceShowHand merged commit dfc2d3e into pingcap:fb/latency May 31, 2022

overvenus mentioned this pull request Jun 6, 2022

Tracking issue for reducing high latency #4757

Closed

37 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scheduler(2pc): agent for 2 phase scheduling #5593

scheduler(2pc): agent for 2 phase scheduling #5593

3AceShowHand commented May 25, 2022 •

edited

Loading

ti-chi-bot commented May 25, 2022 •

edited

Loading

overvenus May 30, 2022

3AceShowHand May 31, 2022

overvenus May 30, 2022

3AceShowHand May 30, 2022

overvenus May 30, 2022

3AceShowHand May 30, 2022

scheduler(2pc): agent for 2 phase scheduling #5593

scheduler(2pc): agent for 2 phase scheduling #5593

Conversation

3AceShowHand commented May 25, 2022 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

ti-chi-bot commented May 25, 2022 • edited Loading

overvenus May 30, 2022

Choose a reason for hiding this comment

3AceShowHand May 31, 2022

Choose a reason for hiding this comment

overvenus May 30, 2022

Choose a reason for hiding this comment

3AceShowHand May 30, 2022

Choose a reason for hiding this comment

overvenus May 30, 2022

Choose a reason for hiding this comment

3AceShowHand May 30, 2022

Choose a reason for hiding this comment

3AceShowHand commented May 25, 2022 •

edited

Loading

ti-chi-bot commented May 25, 2022 •

edited

Loading