Priority-enabled TaskMatcher #7196

dnr · 2025-01-31T01:36:01Z

What changed?

Rewrite TaskMatcher to use explicit priority queues, added as priTaskMatcher.
Changed taskReader to push all tasks into priTaskMatcher, and loads more when the number outstanding is too low. Added as priTaskReader.
Modified Forwarder to work with priTaskMatcher, added as priForwarder.
Config switch to use old or new matcher, and various small changes to support that.

Functional tests: This passes all functional tests, but currently only versioning functional tests flip the switch to use the new matcher (they exercise matching the most).

Unit tests: Tests for new matcher are not there yet. Other tests will need to be modified.

Why?

Ability to prioritize uniformly across all pending tasks including queries+nexus.
Simpler architecture, no new component between taskReader and TaskMatcher.
More separated concerns, e.g. "forwarding" is (mostly) in one place instead of spread around, "rate limiting" is in one place, etc.
Easier to understand code, no more nested selects (this is subjective of course).
Some behavior improvements, e.g. forwarded backlog tasks don't bounce back and forth anymore.
Maybe better performance (after optimizations).

How did you test it?

existing functional tests, need to add/update unit tests

Potential risks

lots of new code, may be new bugs

## What changed?  See title. ## Why?  Flaky tests no good! ## How did you test it?  ## Potential risks  ## Documentation  ## Is hotfix candidate?

## What changed? Wait for clusters to be synced instead of using Sleep. Make sure second run is started before failover. ## Why? Current implementation depends on the timing which is not reliable. ## How did you test it? Repeatedly run the test locally and no failure found. ## Potential risks  ## Documentation  ## Is hotfix candidate?

yiminc · 2025-02-11T23:28:40Z

service/matching/backlog_age_tracker.go

+
+	createTime := ts.AsTime().UnixNano()
+	count := delta
+	if prev, ok := b.tree.Get(createTime); ok {


is treemap thread safe?

No, backlogAgeTracker is not thread safe (commented on line 35), the owner has to synchronize access. The philosophy of the new matcher is to put everything under one lock per matcher (1:1 with partition), to simplify code and reduce overhead.

So this method needs to be called under a lock? I think we need to use convention like methodLocked() to indicate a method needs to be called with lock obtained.

For structs that have both locked and unlocked methods, maybe. This is just a simple data structure. The whole thing is either thread-safe or not. For comparison, maps and slices are not thread-safe and the functions that manipulate them in the stdlib maps and slices packages don't have any special annotations, it's understood that synchronization is the responsibility of the caller.

yiminc · 2025-02-11T23:34:49Z

service/matching/matcher.go

+	startTime       time.Time
+	forwardCtx      context.Context // non-nil iff poll can be forwarded
+	pollMetadata    *pollMetadata   // non-nil iff poll can be forwarded
+	queryOnly       bool            // if true, poller can be given only query task, otherwise any task


do we have poller that is query only?

Yes, all pollers on a passive namespace are query-only

yiminc · 2025-02-12T00:21:00Z

service/matching/matcher.go

+	for range tm.config.ForwarderMaxOutstandingTasks() {
+		go tm.forwardTasks(lim, retrier)
+	}
+	for range tm.config.ForwarderMaxOutstandingPolls() {
+		go tm.forwardPolls()


need to think if we could auto adjust those, having too many configs is a problem that we don't know which one to tweak

These configs have been here forever (and afaik have never been adjusted)

yiminc · 2025-02-12T00:26:31Z

service/matching/matcher_data.go

+	}
+}
+
+func bugIf(cond bool, msg string) {


shall we panic in prod, or only panic in test? Making too stringent check and panic could hurt prod reliability for small bugs.

I don't plan to leave the panics in prod

yiminc · 2025-02-12T00:30:16Z

service/matching/matcher.go

+	var cancel context.CancelFunc
+	if task.forwardCtx != nil {
+		// Use sync match context if we have it (for deadline, headers, etc.)
+		// TODO(pri): does it make sense to subtract 1s from the context deadline here?


No. I think it is the final matcher (that waits on task arrival) that needs to substract 1s from context.

You mean that waits for poll arrival? Maybe that makes more sense. Although in theory, the time it takes for a match to propagate from a parent down to a distant child after n hops is linear in n, so it sort of makes sense to subtract something on each hop

yiminc · 2025-02-12T00:46:38Z

service/matching/matcher.go

+	task      *internalTask
+	poller    *waitingPoller
+	ctxErr    error // set if context timed out/canceled
+	ctxErrIdx int   // index of context that closed first


is there array of context? what is it?

See Enqueue{Task,Poller}AndWait

Not that I want you to spend a lot of time on this, but probably a bunch of questions are answered in the walkthrough recording. The final version will have more high-level architecture comments, once all the structure is settled.

(Although writing long comments is dangerous since they inevitably get out of date and no one fixes them)

yiminc · 2025-02-12T01:22:34Z

service/matching/matcher_data.go

+			} else if task.isPollForwarder && poller.forwardCtx == nil {
+				continue
+			} else if poller.isTaskForwarder && !allowForwarding {
+				continue
+			} else if poller.isTaskValidator && task.forwardCtx != nil {
+				continue


what is task.isPollForwarder and poller.isTaskValidator mean?

They're the ones that do poll/task forwarding. An architecture comment or doc will explain how it works.

yiminc · 2025-02-12T01:45:39Z

service/matching/matcher_data.go

+// limit doesn't allow the task to be matched yet.
+// call with lock held
+func (d *matcherData) findMatch(allowForwarding bool) (*internalTask, *waitingPoller) {
+	// FIXME: optimize so it's not O(d*n) worst case


ideally, this should be O(logN)+O(logM). N, M are the size of tasks and pollers, and it is just pick the top item from both heap.

This function is just picking so it's O(1) in the common case, and yeah, O(log n + log m) to remove them after match. But in general there won't be many pollers and many tasks sitting around at the same time, there'll be tasks that immediately get matched with a poller, or pollers that get immediately matched with a task.

The bad behavior is only when you have a backlog of tasks, and also a lot of query-only pollers. We can handle that with some additional bookkeeping for query-only pollers and query tasks.

yiminc · 2025-02-12T01:46:39Z

service/matching/matcher_data.go

+	lastPoller time.Time // most recent poll start time
+}
+
+type pollerPQ struct {


Do we really need heap for the pollers? All pollers are the same. Can a FIFO array working for pollers?

Yes and no.. regular pollers are the same, but the "task forwarder" is a poller in this architecture and it should be prioritized last. Of course, we can do similar logic with a simple array, just by keeping it on the side. This was just the easiest way to write it at first.

yiminc · 2025-02-12T01:51:47Z

service/matching/task_reader.go

+		retrier          backoff.Retrier
+		loadedTasks      atomic.Int64
+
+		backlogAgeLock sync.Mutex


can we move this lock to inside of backlogAgeTracker, and let backlogAgeTracker decide when to lock/unlock.

I want locking to be managed by the owner of backlogAgeTracker, it should be a pure data structure.

I'd actually like to remove even more locks from matching, we don't need such fine-grained locking, we get plenty of concurrency from many task queues in the same process.

yiminc · 2025-02-12T02:01:40Z

service/matching/task_reader.go

 }

 func (tr *taskReader) getTasksPump(ctx context.Context) error {
 	ctx = tr.backlogMgr.contextInfoProvider(ctx)
+	tr.readerCtx = ctx

 	if err := tr.backlogMgr.WaitUntilInitialized(ctx); err != nil {


is it a risk that between the signal was send and this line of check, the config.GetTasksBatchSize() changed that this check become true without actually any new task loading, and due to that we only send one signal when it cross the the check, we may end up with miss a task load and not able to recover?

well... currently we signal every minute anyway. though I want to remove that (eventually).

but yeah, in theory that could happen. that's one of the things we have to be careful of when we remove the periodic signal.

also, this reload condition needs to change for fairness, so the related code will definitely be improved.

- Rewrite TaskMatcher to use explicit priority queues, added as priTaskMatcher. - Changed taskReader to push all tasks into priTaskMatcher, and loads more when the number outstanding is too low. Added as priTaskReader. - Modified Forwarder to work with priTaskMatcher, added as priForwarder. - Config switch to use old or new matcher, and various small changes to support that. Functional tests: This passes all functional tests, but currently only versioning functional tests flip the switch to use the new matcher (they exercise matching the most). Unit tests: Tests for new matcher are not there yet. Other tests will need to be modified. - Ability to prioritize uniformly across all pending tasks including queries+nexus. - Simpler architecture, no new component between taskReader and TaskMatcher. - More separated concerns, e.g. "forwarding" is (mostly) in one place instead of spread around, "rate limiting" is in one place, etc. - Easier to understand code, no more nested selects (this is subjective of course). - Some behavior improvements, e.g. forwarded backlog tasks don't bounce back and forth anymore. - Maybe better performance (after optimizations). existing functional tests, need to add/update unit tests lots of new code, may be new bugs

dnr force-pushed the pri3 branch 3 times, most recently from cdebec0 to 8236e0f Compare February 4, 2025 05:04

ShahabT and others added 3 commits February 3, 2025 23:25

taskmatcher rewrite wip

Loading
Loading status checks…

a2d0e4c

dnr force-pushed the pri3 branch from 8236e0f to a2d0e4c Compare February 4, 2025 07:26

yiminc reviewed Feb 11, 2025

View reviewed changes

redo userdata changed stuff and some other review comments

Loading
Loading status checks…

4f0ce0b

yiminc reviewed Feb 12, 2025

View reviewed changes

dnr added 2 commits February 12, 2025 17:18

some more review comments

Loading
Loading status checks…

bf4e406

move new matcher/taskReader to new files and make switchable by config

Loading
Loading status checks…

1798f6c

dnr changed the title ~~TaskMatcher rewrite [very wip]~~ Priority-enabled TaskMatcher Feb 14, 2025

dnr marked this pull request as ready for review February 14, 2025 00:00

dnr requested a review from a team as a code owner February 14, 2025 00:00

add a few more tweaks

Loading
Loading status checks…

acf80f6

dnr merged commit 72f8061 into temporalio:priority Feb 19, 2025
40 checks passed

dnr deleted the pri3 branch February 19, 2025 05:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Priority-enabled TaskMatcher #7196

Priority-enabled TaskMatcher #7196

dnr commented Jan 31, 2025 •

edited

Loading

yiminc Feb 11, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

yiminc Feb 11, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

yiminc Feb 12, 2025

dnr Feb 12, 2025

Priority-enabled TaskMatcher #7196

Priority-enabled TaskMatcher #7196

Conversation

dnr commented Jan 31, 2025 • edited Loading

What changed?

Why?

How did you test it?

Potential risks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dnr commented Jan 31, 2025 •

edited

Loading