changefeedccl: changefeed concurrency and frontier observability #67206 #67268

miretskiy · 2021-07-06T13:59:56Z

Stop relying on ExportRequestLimit to determine the number of concurrent
export requests, and introduce a decidated ScanRequestLimit setting.

If the setting is specified, uses that setting; otherwise, the default
value is computed as 3 * (number of nodes in the cluster), which is the
old behavior, but we cap this number so that concurrency does not get
out of hand if running in a very large cluster.

Fixes #67190

Improve Observability of of change frontier updates.

Add a metric to keep track of the number of frontier updates in the
changefeed. Add logging when job progress updates take excessive amount
of time.

Fixes #67192

Release Nodes: Provide a better configurability of scan request
concurrency. Scan requests are issued by changefeeds during the
backfill.

cockroach-teamcity · 2021-07-06T14:00:08Z

This change is

miretskiy · 2021-07-06T14:08:03Z

@HonoreDB I pushed PR you already looked at to another branch. Can you take another look?

stevendanna

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @HonoreDB and @miretskiy)

pkg/ccl/changefeedccl/kvfeed/scanner.go, line 285 at r1 (raw file):

	// This is all hand-wavy: 3 per node used to be the default for a very long time.
	// However, this could get out of hand if the clusters are large.
	// So cap the max to an arbitrary value of a 100.

I like the addition of the cap (although perhaps moving it to a constant might be nice.

One question I have is why we want to scale by the node count at this point at all. We will naturally be scaling the number of scans that happen by the node count because change aggregators for different ranges are distributed across all of the nodes, so why account for it again here?

In a conversation about this, one item that came up is that in the case of core changefeeds, this makes a bit more sense.

To keep it backportable I think it makes sense to leave it as you have it here, but perhaps this is something we could follow-up on.

HonoreDB

Reviewed 3 of 3 files at r1, 1 of 2 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @miretskiy and @stevendanna)

pkg/ccl/changefeedccl/changefeed_processors.go, line 1291 at r2 (raw file):

	}()

	cf.metrics.Flushes.Inc(1)

Is Flushes the right metric to be incrementing here?

pkg/ccl/changefeedccl/kvfeed/scanner.go, line 285 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

I like the addition of the cap (although perhaps moving it to a constant might be nice.

One question I have is why we want to scale by the node count at this point at all. We will naturally be scaling the number of scans that happen by the node count because change aggregators for different ranges are distributed across all of the nodes, so why account for it again here?

In a conversation about this, one item that came up is that in the case of core changefeeds, this makes a bit more sense.

To keep it backportable I think it makes sense to leave it as you have it here, but perhaps this is something we could follow-up on.

I think the rationale is that it's different kinds of nodes that can in theory be scaled independently? But fair point, as-is this doesn't really help with that, it just implicitly assumes that our ability to serve ExportRequests is O(number of changefeed aggregator nodes * total number of nodes =~ number of nodes squared), if I understand correctly, which grows faster than our actual capacity.

miretskiy

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @HonoreDB, @miretskiy, and @stevendanna)

pkg/ccl/changefeedccl/kvfeed/scanner.go, line 285 at r1 (raw file):

Previously, HonoreDB (Aaron Zinger) wrote…

I think the rationale is that it's different kinds of nodes that can in theory be scaled independently? But fair point, as-is this doesn't really help with that, it just implicitly assumes that our ability to serve ExportRequests is O(number of changefeed aggregator nodes * total number of nodes =~ number of nodes squared), if I understand correctly, which grows faster than our actual capacity.

I would like to keep things as they are so that this change can be backported.
We should specify an argument to kvfeed whether or not the changefeed is a core style changefeed.

Stop relying on ExportRequestLimit to determine the number of concurrent export requests, and introduce a decidated ScanRequestLimit setting. If the setting is specified, uses that setting; otherwise, the default value is computed as 3 * (number of nodes in the cluster), which is the old behavior, but we cap this number so that concurrency does not get out of hand if running in a very large cluster. Fixes cockroachdb#67190 Release Nodes: Provide a better configurability of scan request concurrency. Scan requests are issued by changefeeds during the backfill.

Add a metric to keep track of the number of frontier updates in the changefeed. Add logging when job progress updates take excessive amount of time. Fixes cockroachdb#67192 Release Notes: None

HonoreDB

Reviewed 3 of 3 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @HonoreDB, @miretskiy, and @stevendanna)

miretskiy · 2021-07-06T16:18:39Z

tftr
bors r+

craig · 2021-07-06T16:58:17Z

Build succeeded:

GitHub CI (Cockroach)

miretskiy requested review from stevendanna and HonoreDB July 6, 2021 13:59

stevendanna reviewed Jul 6, 2021

View reviewed changes

HonoreDB reviewed Jul 6, 2021

View reviewed changes

miretskiy requested a review from stevendanna July 6, 2021 14:58

miretskiy commented Jul 6, 2021

View reviewed changes

Yevgeniy Miretskiy added 2 commits July 6, 2021 11:25

changefeedccl: Improve observability of change frontier updates.

e5a133e

Add a metric to keep track of the number of frontier updates in the changefeed. Add logging when job progress updates take excessive amount of time. Fixes cockroachdb#67192 Release Notes: None

miretskiy force-pushed the backfill_concurrency branch from fefdf03 to e5a133e Compare July 6, 2021 15:26

HonoreDB approved these changes Jul 6, 2021

View reviewed changes

craig bot merged commit d23d94c into cockroachdb:master Jul 6, 2021

miretskiy mentioned this pull request Jul 7, 2021

release-21.1: changefeedccl: changefeed concurrency and frontier observability #67206 #67307

Closed

miretskiy mentioned this pull request Jul 27, 2021

release-21.1: Backport changefeed observability PRs. #68106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changefeedccl: changefeed concurrency and frontier observability #67206 #67268

changefeedccl: changefeed concurrency and frontier observability #67206 #67268

miretskiy commented Jul 6, 2021

cockroach-teamcity commented Jul 6, 2021

miretskiy commented Jul 6, 2021

stevendanna left a comment

HonoreDB left a comment

miretskiy left a comment

HonoreDB left a comment

miretskiy commented Jul 6, 2021

craig bot commented Jul 6, 2021

changefeedccl: changefeed concurrency and frontier observability #67206 #67268

changefeedccl: changefeed concurrency and frontier observability #67206 #67268

Conversation

miretskiy commented Jul 6, 2021

cockroach-teamcity commented Jul 6, 2021

miretskiy commented Jul 6, 2021

stevendanna left a comment

Choose a reason for hiding this comment

HonoreDB left a comment

Choose a reason for hiding this comment

miretskiy left a comment

Choose a reason for hiding this comment

HonoreDB left a comment

Choose a reason for hiding this comment

miretskiy commented Jul 6, 2021

craig bot commented Jul 6, 2021